Welcome to the wonderful world of handicapped chess engine ratings. You're finding out that almost all of the time, the assigned ratings are just guesses. :)
Exact ELO of low quality engines or SF at lower levels's ELO or Leela's old versions's ELO...?

Let's try to make accurate guesses then, at least so they make some sense as far as possible.
How would you rate a child who just learned the pieces movements and the rules but has absolutely zero understanding of the tactics and zero training to see when a piece is hanging or attacked and then played mainly intuitively with poor untrained intuition?
It should be better than random, but just slightly. Why not take Elo100 random and Elo200 Child first day? We don't really need the first hundreds to fit the Elo formula too much otherwise we'll need negatives Elo (My guess is if we want to be really accurate then we'll have Child around Elo -200 and Random around Elo -500 then a player trying to loose even lower, that may be interesting but has no use at all, better for me is finding a scale with which somehow a beginner will find easy to pick up the engines playing his level in my collection, not that I need this, but that it forces me to have a scale that keep some practical sense. Then higher than lvl 1500 it makes sense that Elo formula is respected. If I do make it a real goal, then I'll probably have engines ending with much higher Elo rating than those usually assumed. It would be no surprise as many state they are a few hundred points underrated. Elo points system cannot work correctly if there is few games played between two communities. And there is a lot of games between engines, a lot between humans, but very few RATED between H/E. And it's hard to make real correspondence as they don't play the same.
Like I'm very surprised with Pinguin loosing THAT HARD against Leela while Leela was still very weak obviously doing much more mistakes than him everybody knew it. So how did he manage to lose like that? Let him play a few hundred games against the same Leela ID and he'll win at least 50% of the time I really bet. But who want to play hundreds games against Leela? There is certainly a solid way to play waiting for her to blunder, and certain situation that increase a lot the blunder probability. You just need to find it. Just like a beginner against PacMan arcade game find it impossible to stay alive but the one who has observed how enemies evolves on the screen following certain laws, he can escape all the time. On funny thing that help understand is when you see Leela progressing from ID to ID on the official page. Than at home you have a few ID and let's say the ID 24 is beats the 23, the 23 beats the 22, the 22 beats the 21 etc. So the Elo is improving. That seems logical. But then you try ID 21 versus ID 24 and you discover the 21 beats the 24. WTF? That really happens. (If you want to check see with ID 9, I'm almost sure it's the most obvious, it beats ID 8 but is beaten by lower if I recall correctly.)
It's obvious that the Elo progression is illusion there! There is one progression but there is too a circle evolution with strategies beating other without being better in an absolute standpoint. Like the Rock-Paper-Cissor game. That's why if I were SF team I would absolutely train my Leela against other engines of same approximate level TOO (not only with herself). That's also why SF8 loosed against AZ. Had the team a few minutes to make some adjustment knowing how AZ plays, like the do with Komodo or Houdini, it would have crushed AZ easily. So Engines ratings are altogether underrated and overrated in the same time... It's no secret that in the superfinal, SF9 was adjusted to beat H6. So there is two different rated SF9 entities. One that you download and play and is always the same, and one that developers adjust for events. The second is a lot better than the first, but it's not an engine, it's a team plus an engine.

I see slowly but more and more clearly that only time and a lot of game will give a relative ELO that is accurate. Or at least relevant. Because reading about this matter, I realise most people talking about ELO just make a huge back-to-front misinterpretation of the meaning of ELO. They do believe ELO is a real fixed number that ideally should be the representation of the real level of a player. But that makes no sense at all, as nothing exists like this. There is nothing like a "real level of a player", there is even less "representation" of it. Elo is an ever-moving number that HELPS evaluate the relative strength. Not the absolute level. Level in chess is relative only. You can consider altitude as an absolute quantity, as you have a perfect (well almost) altitude to compare with (level of the sea) and all places in the world are (roughly) at the same level from the one of the sea and thus at the same relative level from each others, not changing if you change sea. Mount Everest will always be higher than Kilimanjaro, and the difference is always the same.
In chess, your level is relative to other players individually. And they are all different, if you were to calculate precisely you would find having a different Elo versus your brother and versus your father, even if both are rated same 1500 FIDE exactly. Kasparov fears Kramnik who fears Chirov who fears... Kasparov! So it's obvious there is nothing like an "absolute level". Worse : it will change even with your same brother over time. In fact we only have an approximation of what the probabilities could be that a 1800 beats a 1700, but one particular 1700 may have 66% chance to beat one particular 1800, all against the "rules" (because there is no rules), just probabilities) if their balance are so. And at high level the ELO rating is artificially pumped up, that's probably why they all refuse to play against engines. Elo ratings are probably subject to cheating, I'm almost certain at some level you can "buy" Elo points if you want to (not saying it's a widely used practice). Anyway between chess engines it's the same problem, you don't want to know "what engine has what ELO level" because there is no such level an engine has, rather you want to make many games so that the rating you get out of these games gives a rather accurate... I mean, relevant... or even better : "useful" idea of what the probabilities.... not even ARE, but only MAY BE if this one engine were to play versus this one other, just to find that it usually doesn't fit much. And that's it, there's nothing more to get than that. Expecting more would be pure delusion.
Just to show you how.... ? PROBLEMATIC - is the problem.
Engine Score
01: 14#oudi 19.5/21
02: 12#oudi 19.0/21
03: 13#oudi 18.5/20
04: 13®¹Caveman 16.0/21
05: 14®¹Cnstrctr 15.0/22
06: 15®³Marc 13.0/22
07: Pierre17 10.5/20
08: 14Leela 8.5/22
09: $1.Stockfish9 lvl 1/20
10: 16×Txl 3.0/20
11: 15×Txl 3.0/21
12: $0.Stockfish9 lvl 0/20
13: 14×Txl 2.0/21
136 of 780 games played
Name of the tournament: [PRO-BULLET] Step 20
Site/ Country: USER-PC, France
The level is in the name. 14 = Elo 1400 according to programmers.
As you can see, the Houdini supposed to be 1200 Elo is FAR FAR FAR higher than the Texel allegedly 1400. It's like there is no connexion. Only thing I can do is define my own scale and make engine fit by my own adjustments.
And in the Rodent Family, the allegedly weaker one is the strongest and vice versa.