Here is the starting position of my variant that is very equal in a practical sense, (white wins about 52%) but engines don't understand it and evaluate it as around +30 for black. Played a few thousand times from this exact position. As the game continues and white gains more territory, the evaluation starts going haywire and through most of the middlegame, the engine's suggestion is rarely an improvement on a move chosen at random.
Here is a completely drawn position where black is down 6 pawns