This is not hard to figure out. Stockfish 8 was the engine that played Alphazero.
So the best way to compare other NN chess engines. Is to have them also play Stockfish 8. This is why we know 2 things.
1. Alphazero's win was legit.
Your number 1 point absolutely does not follow. I have no idea what logic you see in there, but it doesn't exist. AlphaZero is gone. We have no chance to figure out how strong it really was, unless we accept the word of Goolge - who have no reason for telling the truth and every reason to bend it.
To show that Alphazero's win again was LEGIT, that is why.
And Leela Chess Zero is Alphazero's younger sister.....
The were made with the same design for a chess engine.
The Leela of today is a lot better than AZ was back in the day. Obviously. Engines get better over time.
It's extremely suspicious that a super powerful engine supposedly appears from nowhere, plays a secret match or two, and then disappears forever. The fact alone that the match was played in secret with absolutely no transparency is enough to raise the suspicion of anyone.
It's possible that they cheated with hardware. AZ was running on the Google supercomputer, while SF was running on a grandma's laptop.
It's also possible that the games were cherry-picked. They played 10 thousand games, and then the PR guys picked the 100 that was the most spectacular.
AZ was written specifically to run the evaluations on the Tensor Flow hardware. That's not something Stockfish could run on.
One of the biggest steps was that AZ was a completely self taught engine and just through playing itself got to the point it could beat an engine that was very strong and programmed with theory, years of tuning, and brute force calculations.
Sure, they could have tested with a newer engine at the time and there's a certainly some criticisms that can legitimately be made about the configuration of the hardware running Stockfish. That does not take away from the method used to create the engine. The simple fact that LC0 was able to duplicate the results to a similar degree are a very good indication that Deep Mind's results were very likely legitimate.
Had Deep Mind been interested in anything more than a tech demonstration proving their methods could learn from scratch, they very likely could have continued training the engine, against other engines as well, and then had a competition against the best Stockfish version at the time. Unfortunately, that wasn't their goal and we didn't get anything like that.