I was reading about endgame databases and happened to note Thompson discussing a bit scaling of brute force search, and it ‘asymptotic’ or flattening.
Ken Thompson, oral history, February 2005
...Q: So at that stage how many plies was it [Belle] going and what kind of rating did it have?
Ken Thompson: It was going 8 full plies easily, the other one was probably going 6 and was easily master, probably 2300. I never worked for rating. You can squeeze rating points out by studying your opponents and booking up and taking away—put a punch of trap kind of stuff in. But I wanted to just play nice solid chess, so it was probably 2300.
Q: And I think at some point I recall you talking about the ratings compared to compute power and you had I think extrapolated for and about what kinds of compute power it would take to get certain improvements and ratings
Ken Thompson: Yeah...
Q: I can’t remember when that was, it must have been early ’80s or something...
Ken Thompson: Yeah it was early ’80s—It was asymptotic but it was in the high rise part of the asymptote so you can’t really tell whether—where it rolled over. But it would, certainly in that era you were gaining—for every doubling of horsepower you’d gain a hundred points. I came with some really fake-y back of the envelope calculations that said at around 12 or 13 ply you’d be world class. Maybe not—you’d compete for the world title, you may not win it but you’d be comparable.
Q: Presumably the ratings in some sense get a little less accurate at the very top I would think...
Ken Thompson: Yes, yes.
The WP article on Belle notes some of this in more detail and gives two refers, a 1983 and 1997 book. Quoting a bit (both are on Libgen and easy to acccess):
Newborn [155] has suggested that if you double the speed of a brute
force program, you will gain 100 rating points. If you assume a 5.5 branching factor, then a ply will yield 250 points. In the linear range of P4-P6
(the range observed by Newborn in existing programs) this relation holds.
In the extended range of this experiment, ratings do not grow as fast
as predicted. Any major conclusions about the experiment we leave up
to the reader. Here are some of our observations that may be of help.
Thompson conducted two experiments in the late 1970s to study this
issue. In his first experiment, he had one version of BELLE play a series of
twenty games against an identical second version, except that one version,
call it BELLE(P), searched to a depth of, say, P levels while the other,
BELLE(P + 1), searched to a depth of P + 1 levels. The results of this experiment are presented in Figure 6.24a for 3 ⇐ P ⇐ 8. It shows, for example,
that BELLE(3) won four of the twenty points against BELLE(4), and that
BELLE(4) won five and a half of twenty points against BELLE(5). His second
experiment, carried out shortly thereafter, was more extensive: for 4 ⇐ P, Q
⇐ 9 and for P ≠ Q, BELLE(P) played a twenty-game series against BELLE(Q).
The results are presented in Figure 6.24b. Rather than relating rating improvement to the number of positions searched, Thompson related rating
improvement to the depth of search, pointing out that an improvement of
one hundred points for each doubling of speed was approximately equivalent to an improvement of 250 points for each additional level of search.
The main difference between the curve of ratings shown in Figure 6.23
and Thompson’s results is that the relation between ratings and search depth
appears to be almost linear to about 2500 while Thompson found that rating improvement drops off from linear at ratings above 2000 points. Certainly, eventually the curve approaches the limit imposed by perfect play but
the question is how fast. Thompson evidently did not study deeper search
depths because of the large amount of time required to obtain results.
[The next part discusses a different way to try to extrapolate based on differences in tree search choice which reminds me a little of Paul Christiano’s counting argument/coincidence in Andy Jone’s RL scaling paper.]
The descriptions here of flattening are suspicious, given the economizing on compute. It doesn’t look like Thompson ever clearly revisited the topic later on than the ’80s, or considered functional forms beyond increasing 1 ply at a time and expecting n ELO points. I would not be surprised if Belle continued scaling the same as Jones, and here again practitioners mistake a log or powerlaw for ‘flattening out’ (and thus futility compared to Bitter-Lesson-vulnerable-but-more-engineerable approaches).
I was reading about endgame databases and happened to note Thompson discussing a bit scaling of brute force search, and it ‘asymptotic’ or flattening. Ken Thompson, oral history, February 2005
The WP article on Belle notes some of this in more detail and gives two refers, a 1983 and 1997 book. Quoting a bit (both are on Libgen and easy to acccess):
1983 says
From [the book](https://cloudflare-ipfs.com/ipfs/bafykbzacebnknut4q77j4foxnybstbry5w3g5poigcqase47g4cryanhweags?filename=Monty Newborn (auth.) - Kasparov versus Deep Blue_ computer chess comes of age-Springer-Verlag New York (1997).pdf#page=132)
The descriptions here of flattening are suspicious, given the economizing on compute. It doesn’t look like Thompson ever clearly revisited the topic later on than the ’80s, or considered functional forms beyond increasing 1 ply at a time and expecting n ELO points. I would not be surprised if Belle continued scaling the same as Jones, and here again practitioners mistake a log or powerlaw for ‘flattening out’ (and thus futility compared to Bitter-Lesson-vulnerable-but-more-engineerable approaches).