Stockfish 12 and newer have neural network (NNUE)-based evaluation enabled by default so I wouldn’t say that Stockfish is similar to other non-NN modern engines.
I was imagining using Stockfish from before the introduction of NNUE (I think that’s August 2020?). Seems worth being careful about.
https://nextchessmove.com/dev-builds is based on playing various versions of Stockfish against each other. However, it is known that this overestimates the ELO gain. I believe +70 ELO for doubling compute is also on the high side, even on single-core computers.
I am very interested in the extent to which “play against copies of yourself” overstates the elo gains.
I am hoping to get some mileage / robustness out of the direct comparison—how much do we have to scale up/down the old/new engine for them to be well-matched with each other? Hopefully that will look similar to the numbers from looking directly at Elo.
(But point taken about the claimed degree of algorithmic progress above.)
I was imagining using Stockfish from before the introduction of NNUE (I think that’s August 2020?). Seems worth being careful about.
I am very interested in the extent to which “play against copies of yourself” overstates the elo gains.
I am hoping to get some mileage / robustness out of the direct comparison—how much do we have to scale up/down the old/new engine for them to be well-matched with each other? Hopefully that will look similar to the numbers from looking directly at Elo.
(But point taken about the claimed degree of algorithmic progress above.)