Stockfish 12 and newer have neural network (NNUE)-based evaluation enabled by default so I wouldn’t say that Stockfish is similar to other non-NN modern engines.
https://nextchessmove.com/dev-builds is based on playing various versions of Stockfish against each other. However, it is known that this overestimates the ELO gain. I believe +70 ELO for doubling compute is also on the high side, even on single-core computers.
Stockfish 12 and newer have neural network (NNUE)-based evaluation enabled by default so I wouldn’t say that Stockfish is similar to other non-NN modern engines.
I was imagining using Stockfish from before the introduction of NNUE (I think that’s August 2020?). Seems worth being careful about.
https://nextchessmove.com/dev-builds is based on playing various versions of Stockfish against each other. However, it is known that this overestimates the ELO gain. I believe +70 ELO for doubling compute is also on the high side, even on single-core computers.
I am very interested in the extent to which “play against copies of yourself” overstates the elo gains.
I am hoping to get some mileage / robustness out of the direct comparison—how much do we have to scale up/down the old/new engine for them to be well-matched with each other? Hopefully that will look similar to the numbers from looking directly at Elo.
(But point taken about the claimed degree of algorithmic progress above.)
Regarding the ELO gain with compute: That’s a function of diminishing returns. At very small compute, you gain +300 ELO; after ~10 doublings that reduces to +30 ELO. In between is the region with ~70 ELO; that’s where engines usually operate on present hardware with minutes of think time. I currently run a set of benchmarks to plot a nice graph of this.
Stockfish 12 and newer have neural network (NNUE)-based evaluation enabled by default so I wouldn’t say that Stockfish is similar to other non-NN modern engines.
https://nextchessmove.com/dev-builds is based on playing various versions of Stockfish against each other. However, it is known that this overestimates the ELO gain. I believe +70 ELO for doubling compute is also on the high side, even on single-core computers.
I was imagining using Stockfish from before the introduction of NNUE (I think that’s August 2020?). Seems worth being careful about.
I am very interested in the extent to which “play against copies of yourself” overstates the elo gains.
I am hoping to get some mileage / robustness out of the direct comparison—how much do we have to scale up/down the old/new engine for them to be well-matched with each other? Hopefully that will look similar to the numbers from looking directly at Elo.
(But point taken about the claimed degree of algorithmic progress above.)
Good point: SF12+ profit from NNs indirectly.
Regarding the ELO gain with compute: That’s a function of diminishing returns. At very small compute, you gain +300 ELO; after ~10 doublings that reduces to +30 ELO. In between is the region with ~70 ELO; that’s where engines usually operate on present hardware with minutes of think time. I currently run a set of benchmarks to plot a nice graph of this.