Yeah I think I agree. It also applies to most research about inductive biases of neural networks (and all of statistical learning theory). Not saying it won’t be useful, just that there’s a large mysterious gap between great learning theories and alignment solutions and inside that gap is (probably, usually) something like the levels-of-abstraction mistake.
This definitely sounds like a mistake someone could make while thinking about singular learning theory or neuroscience, but I don’t think it sounds like a mistake that I’m making? It does in fact seem a lot easier to go from a theory of how model structure, geometry, & rewards maps to goal generalization to a theory of values than it does to go from the mechanics of transistors to tetris, or the mechanics of neurons to a theory of values.
The former problem (structure-geometry-&-rewards-to-goals to value-theory) is not trivial, but seems like only one abstraction leap, while the other problems seem like very many abstraction leaps (7 to be exact, in the case of transistors->tetris).
The problem is not that abstraction is impossible, but that abstraction is hard, and you should expect to be done in 50 years if launching a research program requiring crossing 7 layers of abstraction (the time between Turing’s thesis, and making Tetris). If just crossing one layer, the same crude math says you should expect to be done in 7 years (edit: Though also going directly from a high level abstraction to an adjacently high level abstraction is a far easier search task than trying to connect a very low level abstraction to a very high level abstraction. This is also part of the mistake many make, and I claim its not a mistake that I’m making).
I think this applies to Garrett Baker’s hopes for the application of singular learning theory to decoding human values.
Yeah I think I agree. It also applies to most research about inductive biases of neural networks (and all of statistical learning theory). Not saying it won’t be useful, just that there’s a large mysterious gap between great learning theories and alignment solutions and inside that gap is (probably, usually) something like the levels-of-abstraction mistake.
This definitely sounds like a mistake someone could make while thinking about singular learning theory or neuroscience, but I don’t think it sounds like a mistake that I’m making? It does in fact seem a lot easier to go from a theory of how model structure, geometry, & rewards maps to goal generalization to a theory of values than it does to go from the mechanics of transistors to tetris, or the mechanics of neurons to a theory of values.
The former problem (structure-geometry-&-rewards-to-goals to value-theory) is not trivial, but seems like only one abstraction leap, while the other problems seem like very many abstraction leaps (7 to be exact, in the case of transistors->tetris).
The problem is not that abstraction is impossible, but that abstraction is hard, and you should expect to be done in 50 years if launching a research program requiring crossing 7 layers of abstraction (the time between Turing’s thesis, and making Tetris). If just crossing one layer, the same crude math says you should expect to be done in 7 years (edit: Though also going directly from a high level abstraction to an adjacently high level abstraction is a far easier search task than trying to connect a very low level abstraction to a very high level abstraction. This is also part of the mistake many make, and I claim its not a mistake that I’m making).