Other people were commending your tabooing of words, but I feel using terms like “multi-layer parameterized graphical function approximator” fails to do that, and makes matters worse because it leads to non-central fallacy-ing. It’d been more appropriate to use a term like “magic” or “blipblop”. Calling something a function appropriator leads to readers carrying a lot of associations into their interpretation, that probably don’t apply to deep learning, as deep learning is a very specific example of function approximation, that deviates from the prototypical examples in many respects. (I think when you say “function approximator” the image that pops into most peoples head is fitting a polynomial to a set of datapoints in R^2)
Calling something a function approximator is only meaningful if you make a strong argument for why a function approximator cant (or at least is systematically unlikely to) give rise to specific dangerous behaviors or capabilities. But I don’t see you giving such arguments in this post. Maybe I did not understand it. In either case, you can read posts like Gwern’s “Tools want to be agents” or Yudkowsky’s writings, explaining why goal directed behavior is a reasonable thing to expect to arise from current ML, and you can replace every instance of “neural network” / “AI” with “multi-layer parameterized graphical function approximator”, and I think you’ll find that all the arguments make just as much sense as they did before. (modulo some associations seeming strange, but like I said, I think thats because there is some non-central fallacying going on).
That deviates from the prototypical examples in many respects.
It basically proves too much because it’s equivocation. I am struggling to find anything in Zack’s post which is not just the old wine of the “just” fallacy in new ‘function approximation’ skins. When someone tells you that a LLM is “just” next token prediction, or a neural network is “just some affine layers with nonlinearities” or it’s “just a Markov chain with a lot of statistics”, then you’ve learned more about the power and generality of ‘next token prediction’ etc than you have what they were trying to debunk.
If I use the mean squared error loss function to approximate a set of data points in the plane with a line (which some authors call a “linear regression model” for some reason), obviously the line itself does not somehow contain a representation of general squared-error-minimization. The line is just a line.
I don’t think that is obvious at all, and is roughly on the level of saying ‘a tiger is just a set of atoms in a 3D volume’ or ‘programs are just a list of bits’. What are these data points on this hyperplane, exactly...? They could be anything—they could be, say, embeddings of optimization algorithms*. If you had an appropriate embedding of the latent space of algorithms, why can’t there be a point (or any other kind of object or structure, such as a line) which corresponds to general squared-error minimization or others? And this doesn’t seem ‘possible’, this seems likely: already an existing LLM like GPT-4 or Claude-3 Opus is clearly mapping its few-shot examples to some sort of latent space, appears to be doing internal gradient descent on higher level embeddings or manifolds, and is quite effective at writing things like ‘here are 10 variants of error minimization optimization algorithms as examples; write a Python program using Numpy to write a new one which [...]’ (and if it is not, it sure seems like future ones will); something inside that LLM must correspond to a small number of bytes and some sort of mathematical object and represent the combination of the points. Programs and algorithms are ‘just’ data points, like everything else. (“With an appropriate encoding, an AGI is just an index into the decimal expansion of pi; an index, a simple ordinary natural number, is obviously completely harmless; QED, AGI is completely harmless.”) Which means that if you think something is ‘just’ data, then your assertions are meaninglessly vacuous because if it applies to everything, then it means nothing.
* Or more relevantly, RL algorithms such as specific agents… of course a line or point on an appropriate hyperplane can ‘represent’ the loss function, and it would be a useful thing to do so. Why would you ever want a system to not be able to do that? We can see in Decision Transformers and research into the meta-reinforcement learning behavior of LLMs that large LLMs prompted with reward functions & scenarios, trained by imitation learning at scale across countless agents, do what must be exactly that, and represent environments and rewards, and can eg. explicitly write the source code to implement reward functions for RL training of sub-agents. I think it’s telling that these results generally do not come up in these posts about how ‘reward is not the optimization target’.
I am struggling to find anything in Zack’s post which is not just the old wine of the “just” fallacy [...] learned more about the power and generality of ‘next token prediction’ etc than you have what they were trying to debunk.
I wouldn’t have expected you to get anything out of this post!
Okay, if you project this post into a one-dimensional “AI is scary and mysterious” vs. “AI is not scary and not mysterious” culture war subspace, then I’m certainly writing in a style that mood-affiliates with the latter. The reason I’m doing that is because the picture of what deep learning is that I got from being a Less Wrong-er felt markedly different from the picture I’m getting from reading the standard textbooks, and I’m trying to supply that diff to people who (like me-as-of-eight-months-ago, and unlike Gwern) haven’t read the standard textbooks yet.
I think this is a situation where different readers need to hear different things. I’m sure there are grad students somewhere who already know the math and could stand to think more about what its power and generality imply about the future of humanity or lack thereof. I’m not particularly well-positioned to help them. But I also think there are a lot of people on this website who have a lot of practice pontificating about the future of humanity or lack thereof, who don’t know that Simon Prince and Christopher Bishop don’t think of themselves as writing about agents. I think that’s a problem! (One which I am well-positioned to help with.) If my attempt to remediate that particular problem ends up mood-affiliating with the wrong side of a one-dimensional culture war, maybe that’s because the one-dimensional culture war is crazy and we should stop doing it.
I don’t object to ironic exercises like “A Modest Proposal” or “On The Impossibility of Supersized Machines”, but it can certainly be difficult to write and ensure the message gets across, and I don’t think it does here.
Other people were commending your tabooing of words, but I feel using terms like “multi-layer parameterized graphical function approximator” fails to do that, and makes matters worse because it leads to non-central fallacy-ing. It’d been more appropriate to use a term like “magic” or “blipblop”. Calling something a function appropriator leads to readers carrying a lot of associations into their interpretation, that probably don’t apply to deep learning, as deep learning is a very specific example of function approximation, that deviates from the prototypical examples in many respects. (I think when you say “function approximator” the image that pops into most peoples head is fitting a polynomial to a set of datapoints in R^2)
Calling something a function approximator is only meaningful if you make a strong argument for why a function approximator cant (or at least is systematically unlikely to) give rise to specific dangerous behaviors or capabilities. But I don’t see you giving such arguments in this post. Maybe I did not understand it. In either case, you can read posts like Gwern’s “Tools want to be agents” or Yudkowsky’s writings, explaining why goal directed behavior is a reasonable thing to expect to arise from current ML, and you can replace every instance of “neural network” / “AI” with “multi-layer parameterized graphical function approximator”, and I think you’ll find that all the arguments make just as much sense as they did before. (modulo some associations seeming strange, but like I said, I think thats because there is some non-central fallacying going on).
It basically proves too much because it’s equivocation. I am struggling to find anything in Zack’s post which is not just the old wine of the “just” fallacy in new ‘function approximation’ skins. When someone tells you that a LLM is “just” next token prediction, or a neural network is “just some affine layers with nonlinearities” or it’s “just a Markov chain with a lot of statistics”, then you’ve learned more about the power and generality of ‘next token prediction’ etc than you have what they were trying to debunk.
I don’t think that is obvious at all, and is roughly on the level of saying ‘a tiger is just a set of atoms in a 3D volume’ or ‘programs are just a list of bits’. What are these data points on this hyperplane, exactly...? They could be anything—they could be, say, embeddings of optimization algorithms*. If you had an appropriate embedding of the latent space of algorithms, why can’t there be a point (or any other kind of object or structure, such as a line) which corresponds to general squared-error minimization or others? And this doesn’t seem ‘possible’, this seems likely: already an existing LLM like GPT-4 or Claude-3 Opus is clearly mapping its few-shot examples to some sort of latent space, appears to be doing internal gradient descent on higher level embeddings or manifolds, and is quite effective at writing things like ‘here are 10 variants of error minimization optimization algorithms as examples; write a Python program using Numpy to write a new one which [...]’ (and if it is not, it sure seems like future ones will); something inside that LLM must correspond to a small number of bytes and some sort of mathematical object and represent the combination of the points. Programs and algorithms are ‘just’ data points, like everything else. (“With an appropriate encoding, an AGI is just an index into the decimal expansion of pi; an index, a simple ordinary natural number, is obviously completely harmless; QED, AGI is completely harmless.”) Which means that if you think something is ‘just’ data, then your assertions are meaninglessly vacuous because if it applies to everything, then it means nothing.
* Or more relevantly, RL algorithms such as specific agents… of course a line or point on an appropriate hyperplane can ‘represent’ the loss function, and it would be a useful thing to do so. Why would you ever want a system to not be able to do that? We can see in Decision Transformers and research into the meta-reinforcement learning behavior of LLMs that large LLMs prompted with reward functions & scenarios, trained by imitation learning at scale across countless agents, do what must be exactly that, and represent environments and rewards, and can eg. explicitly write the source code to implement reward functions for RL training of sub-agents. I think it’s telling that these results generally do not come up in these posts about how ‘reward is not the optimization target’.
I wouldn’t have expected you to get anything out of this post!
Okay, if you project this post into a one-dimensional “AI is scary and mysterious” vs. “AI is not scary and not mysterious” culture war subspace, then I’m certainly writing in a style that mood-affiliates with the latter. The reason I’m doing that is because the picture of what deep learning is that I got from being a Less Wrong-er felt markedly different from the picture I’m getting from reading the standard textbooks, and I’m trying to supply that diff to people who (like me-as-of-eight-months-ago, and unlike Gwern) haven’t read the standard textbooks yet.
I think this is a situation where different readers need to hear different things. I’m sure there are grad students somewhere who already know the math and could stand to think more about what its power and generality imply about the future of humanity or lack thereof. I’m not particularly well-positioned to help them. But I also think there are a lot of people on this website who have a lot of practice pontificating about the future of humanity or lack thereof, who don’t know that Simon Prince and Christopher Bishop don’t think of themselves as writing about agents. I think that’s a problem! (One which I am well-positioned to help with.) If my attempt to remediate that particular problem ends up mood-affiliating with the wrong side of a one-dimensional culture war, maybe that’s because the one-dimensional culture war is crazy and we should stop doing it.
I don’t object to ironic exercises like “A Modest Proposal” or “On The Impossibility of Supersized Machines”, but it can certainly be difficult to write and ensure the message gets across, and I don’t think it does here.