paulfchristiano comments on How much chess engine progress is about adapting to bigger computers?

paulfchristiano 8 Jul 2021 21:52 UTC
LW: 4 AF: 4
AF
Is your prediction that e.g. the behavior of chess will be unrelated to the behavior of SAT solving, or to factoring? Or that “those kinds of things” can be related to each other but not to image classification? Or is your prediction that the “new regime” for chess (now that ML is involved) will look qualitatively different than the old regime?
There are problems where one paper reduces the compute requirements by 20 orders of magnitude. Or gets us from couldn’t do X at all, to able to do X easily.
I’m aware of very few examples of that occurring for problems that anyone cared about (i.e. in all such cases we found the breakthroughs before they mattered, not after). Are you aware of any?
a prime factoring algorithm is maths
Factoring algorithms, or primality checking, seem like fine domains to study to me. I’m also interested in those and would be happy to offer similar bounties for similar analyses.
You have a spectrum of possible reference classes for transformative AI that range from the almost purely software driven progress, to the almost totally hardware driven progress.
I think it’s pretty easy to talk about what distinguishes chess, SAT, classification, or factoring from multiplication. And I’m very comfortable predicting that the kind of AI that helps with R&D is more like the first four than like the last (though these things are surely on a spectrum).
You may have different intuitions, I think that’s fine, in which case this explains part of why this data is more interesting to me than you.
Progress on chess AI’s contained no breakthroughs, no fundamental insights, only a slow accumulation of little tricks.
Can you point to a domain where increasing R&D led to big insights that improved performance?
Perhaps more importantly, machine learning is also “a slow accumulation of little tricks,” so the analogy seems fine to me. (You might think that future AI is totally different, which is fine and not something I want to argue about here.)
To gain more info about transformative AI, someone would have to make either a good case for why it should be at a particular position on the scale, or a good case for why its position on the scale should be similar to the position of some previous piece of past research. In the latter case, we can gain from examining the position of that research topic. If hypothetically that topic was chess, then the research you propose would be useful. If the reason you chose chess was purely that you thought it was easier to measure, then the results are likely useless.
If Alice says this and so never learns about anything, and Bob instead learns a bunch of facts about a bunch of domains, I’m pretty comfortable betting on Bob being more accurate about most topics.
I think the general point is: different domains differ from one another. You want to learn about a bunch of them and see what’s going on, in order to reason about a new domain.
The consistency of chess performance looks like more selection bias. You aren’t choosing a problem domain where there was one huge breakthrough that. You are choosing a problem domain that has had slow consistent progress.
I agree with the basic point that board games are selected to be domains where there is an obvious simple thing to do, and so progress started early. I think in that way they are similar to SAT solving and factoring, and (slightly) different from image classification, for which it’s arguably hard to measure progress before the 90s.
I think that the more important difference between chess and image classification (in terms of making the chess data clean) is that there is a homogeneous measure of performance for chess, whereas image classification has moved on to harder and harder tasks. I think this does slightly change the nature of the task, but mostly it just makes the data clean.
I think that the main difference between chess and SAT solving is mostly that chess is more naturally interesting to people so they’ve been working on it longer, and that is a real factor that makes the data cleaner (without making it less useful as an analogy). SAT solving also has some of the image classification problem, of depending a lot on the distribution of instances.
(With respect to “aren’t choosing a domain where there was one huge breakthrough,” I’m definitely interested in domains with such breakthroughs.)