I liked how this post tabooed terms and looked at things at lower levels of abstraction than what is usual in these discussions.
I’d compare tabooing to a frame by Tao about how in mathematics you have the pre-rigorous, rigorous and post-rigorous stages. In the post-rigorous stage one “would be able to quickly and accurately perform computations in vector calculus by using analogies with scalar calculus, or informal and semi-rigorous use of infinitesimals, big-O notation, and so forth, and be able to convert all such calculations into a rigorous argument whenever required” (emphasis mine).
Tabooing terms and being able to convert one’s high-level abstractions into mechanistic arguments whenever required seems to be the counterpart in (among others) AI alignment. So, here’s positive reinforcement for taking the effort to try and do that!
Separately, I found the part
(Statistical modeling engineer Jack Gallagher has described his experience of this debate as “like trying to discuss crash test methodology with people who insist that the wheels must be made of little cars, because how else would they move forward like a car does?”)
quite thought-provoking. Indeed, how is talk about “inner optimizers” driving behavior any different from “inner cars” driving the car?
Here’s one answer:
When you train a ML model with SGD—wait, sorry, no. When you try construct an accurate multi-layer parametrized graphical function approximator, a common strategy is to do small, gradual updates to the current setting of parameters. (Some could call this a random walk or a stochastic process over the set of possible parameter-settings.) Over the construction-process you therefore have multiple intermediate function approximators. What are they like?
The terminology of “function approximators” actually glosses over something important: how is the function computed? We know that it is “harder” to construct some function approximators than others, and depending on the amount of “resources” you simply cannot[1] do a good job. Perhaps a better term would be “approximative function calculators”? Or just anything that stresses that there is some internal process used to convert inputs to outputs, instead of this “just happening”.
This raises the question: what is that internal process like? Unfortunately the texts I’ve read on multi-layer parametrized graphical function approximation have been incomplete in these respects (I hope the new editions will cover this!), so take this merely as a guess. In many domains, most clearly games, it seems like “looking ahead” would be useful for good performance[2]: if I do X, the opponent could do Y, and I could then do Z. Perhaps these approximative function calculators implement even more general forms of search algorithms.
So while searching for accurate approximative function calculators we might stumble upon calculators that itself are searching for something. How neat is that!
I’m pretty sure that under the hood cars don’t consist of smaller cars or tiny car mechanics—if they did, I’m pretty sure my car building manual would have said something about that.
(As usual, assuming standard computational complexity conjectures like P != NP and that one has reasonable lower bounds in finite regimes, too, rather than only asymptotically.)
Or, if you don’t like the word “performance”, you may taboo it and say something like “when trying to construct approximative function calculators that are good at playing chess—in the sense of winning against a pro human or a given version of Stockfish—it seems likely that they are, in some sense, ‘looking ahead’ for what happens in the game next; this is such an immensely useful thing for chess performance that it would be surprising if the models did not do anything like that”.
I liked how this post tabooed terms and looked at things at lower levels of abstraction than what is usual in these discussions.
I’d compare tabooing to a frame by Tao about how in mathematics you have the pre-rigorous, rigorous and post-rigorous stages. In the post-rigorous stage one “would be able to quickly and accurately perform computations in vector calculus by using analogies with scalar calculus, or informal and semi-rigorous use of infinitesimals, big-O notation, and so forth, and be able to convert all such calculations into a rigorous argument whenever required” (emphasis mine).
Tabooing terms and being able to convert one’s high-level abstractions into mechanistic arguments whenever required seems to be the counterpart in (among others) AI alignment. So, here’s positive reinforcement for taking the effort to try and do that!
Separately, I found the part
quite thought-provoking. Indeed, how is talk about “inner optimizers” driving behavior any different from “inner cars” driving the car?
Here’s one answer:
When you train a ML model with SGD—wait, sorry, no. When you try construct an accurate multi-layer parametrized graphical function approximator, a common strategy is to do small, gradual updates to the current setting of parameters. (Some could call this a random walk or a stochastic process over the set of possible parameter-settings.) Over the construction-process you therefore have multiple intermediate function approximators. What are they like?
The terminology of “function approximators” actually glosses over something important: how is the function computed? We know that it is “harder” to construct some function approximators than others, and depending on the amount of “resources” you simply cannot[1] do a good job. Perhaps a better term would be “approximative function calculators”? Or just anything that stresses that there is some internal process used to convert inputs to outputs, instead of this “just happening”.
This raises the question: what is that internal process like? Unfortunately the texts I’ve read on multi-layer parametrized graphical function approximation have been incomplete in these respects (I hope the new editions will cover this!), so take this merely as a guess. In many domains, most clearly games, it seems like “looking ahead” would be useful for good performance[2]: if I do X, the opponent could do Y, and I could then do Z. Perhaps these approximative function calculators implement even more general forms of search algorithms.
So while searching for accurate approximative function calculators we might stumble upon calculators that itself are searching for something. How neat is that!
I’m pretty sure that under the hood cars don’t consist of smaller cars or tiny car mechanics—if they did, I’m pretty sure my car building manual would have said something about that.
(As usual, assuming standard computational complexity conjectures like P != NP and that one has reasonable lower bounds in finite regimes, too, rather than only asymptotically.)
Or, if you don’t like the word “performance”, you may taboo it and say something like “when trying to construct approximative function calculators that are good at playing chess—in the sense of winning against a pro human or a given version of Stockfish—it seems likely that they are, in some sense, ‘looking ahead’ for what happens in the game next; this is such an immensely useful thing for chess performance that it would be surprising if the models did not do anything like that”.