paulfchristiano comments on Benchmarking an old chess engine on new hardware

paulfchristiano 16 Jul 2021 15:24 UTC
8 points
Very interesting, thanks!
- Could you confirm how much you have to scale down SF13 in order to match SF3? (This seems similar to what you did last time, but a more direct comparison.)
- The graph from last time makes it look like SF13 would match Rebel at about 20k nodes/move. Could you also confirm that?
- Looking forward to seeing the scaled-up Rebel results.
- hippke 16 Jul 2021 20:13 UTC
  6 points
  Parent
  - With a baseline of 10 MNodes/move for SF3, I need to set SF13 to 0.375 MNodes/move for equality. That’s a factor of 30. Caveat: I only ran 10 games which turned out equal, and only at 10 MNodes/move for SF3.
  - Yes: Rebel6 at normal 2021 settings (40 moves in 15 min) can be approximately matched with SF13 at 20 kNodes/move. More precisely: I get parity between Rebel6 (128 MB) and SF13 (128 MB) for 16 MNodes/move vs. 20 kNodes/move (=factor of 800x). On my Intel Core-M 5Y31 (750 kNodes/s), that’s 21s vs. 0.026s per move. Note that the figure shows SF8, not SF13.
  - I was contacted by one person via PM, we are discussing the execution setup. Otherwise, I could do it by the end of July after my vacation.
  - hippke 19 Jul 2021 6:06 UTC
    17 points
    0
    Parent
    I ran the experiment “Rebel 6 vs. Stockfish 13” on Amazon’s AWS EC2. I rented a Xeon Platinum 8124M which benched at 18x 1.5 MNodes/s. I launched 18 concurrent single-threaded game sets with 128 MB of RAM for each engine. Again, ponder was of, no books, no tables. Time settings were 40 moves in 60s + 0.6 per move, corresponding to 17.5 MNodes/move. For reference, SF13 benches at ELO 3630 at this setting (entry “64 bit”); Rebel 6.0 got 2415 on a Pentium 90 (SSDF Computer Rating List (01-DEC-1996).txt, 90 kN/move).
    The result:
    1911 games played
    18 draws
    No wins for Rebel
    All draws when Rebel played white
    ELO difference: 941 +- 63
    Interpretation:
    Starting from 3630 for SF13, that corresponds to Rebel on a modern machine: 2689.
    Up from 2415, that’s +274 ELO.
    The ELO gap between Rebel on a 1994 Pentium 90 (2415) and SF13 on a 2020 PC (3630) is 1215 points. Of these, 274 points are closed with matching hardware.
    That gives 23% for the compute, 77% for the algorithm.
    Final questions:
    Isn’t +274 ELO too little for 200x compute?
    We found 50% algo/50% compute for SF3-SF13. Why is that?
    Answer: ELO gain with compute is not a linear function, but one with diminishing returns. Thus, the percentage “due to algo” increases, the longer the time frame. Thus, a fixed percentage is not a good answer.
    But we can give the percentage as a function of time gap:
    Over 10 years, it’s ~50%
    Over 25 years, it’s ~22%
    With data from other sources (SF8, Houdini 3) I made this figure to show the effect more clearly. The dashed black line is a double-log fit function: A base-10 log for the exponential increase of compute with time, and a natural log for the exponential search tree of chess. The parameter values are engine-dependent, but should be similar for engines of the same era (here: Houdini 3 and SF8). With more and more compute, the ELO gain approaches zero. In the future, we can expect engines whose curve is shifted to the right side of this plot.