Eliezer has never denied that neural nets can work (and he provides examples in that linked post of NNs working). Eliezer’s principal objection was that NNs were inscrutable black boxes which would be insanely difficult to make safe enough to entrust humanity-level power to compared to systems designed to be more mathematically tractable from the start. (If I may quip: “The ‘I’, ‘R’, & ‘S’ in the acronym ‘DL’ stand for ‘Interpretable, Reliable, and Safe’.”)
This remains true—for all the good work on NN interpretability, assisted by the surprising levels of linearity inside them, NNs remain inscrutable. To quote Neel Nanda the other day (who has overseen quite a lot of the interpretability research that anyone replying to this comment might be tempted to cite):
Oh man, I do AI interpretability research, and we do not know what deep learning neural networks do. An fMRI scan style thing is nowhere near knowing how it works.
What Eliezer (and I, and pretty much every other LWer at the time who spent any time looking at neural nets) got wrong about neural nets, and has admitted as much, is the timing. (Aside from that, Ms Lincoln...)
To expand a bit on the backstory I also discussed in my scaling hypothesis essay: neural nets seemed like they were a colossally long way away. I don’t know how to convey how universal a sentiment this was, or how astonishingly unimpressive neural nets were in 2008 when he was writing that. I was really interested in NNs at that time because the basic argument of ‘humans are neural nets; therefore, neural nets must work for AGI’ is so obviously correct, but even Schmidhuber, hyping his lab’s work to the skies, had nothing better to show than ‘we can win a contest about some simple handwritten digits’. Oh wow. So amazing, much nets, very win. Truly the AI paradigm of the future… the distant future.
Everyone except Shane Legg was wrong about DL prospects & timing, and even Legg was wrong about important things—if you look at his early writings, he’s convinced that DL will take off and reach human-level in the mid-2020s half because of classic Moravec/Kruzweill/Turing-style projections from Moore’s law, yes, but also half because he’s super enthusiastic about all the wonderful neuroscientific discoveries in the mid-2000s which finally show How The Brain Works (For Real This Time*)™. So DeepMind simply needed to surf the compute wave to snap together all the neuroscience & reinforcement learning modules into something like Agent 57, and hey presto—AGI! But most of that DM neuroscience-inspired research is long since forgotten or abandoned, leading-edge DRL archs look nothing like a brain, and the current Transformer architecture owes even less than most to neurobiological inspiration, and it’s unclear how much the Transformer arch matters at all compared to simple scale. DeepMind is now (still) on the hindfoot, and has suffered an ignominious shotgun wedding-merging to Google Brain
(Google Brain itself is now dissolved as penalty for failing the scaling test. Nor is it the only lab to suffer for failing to call scaling—Microsoft Research is increasingly moribund, and FAIR has apparently suffered major changes too. Maybe that’s why LeCun is so shrill on Twitter and adamantly denying that LLMs have any agentic properties whatsoever, nevermind that he’s the cherry-on-top guy… Moravec? Pretty good, but seems to have downplayed the role of training, overestimated robotics progress, and broadly tended to expect too-early dates. Dario Amodei? A relative late-comer who has published little, and while ‘big blob of compute’ aged well, other claims don’t seem to—for example, in 2013, he seems to think that neural nets will not tend to have any particular goals or if they do, it’ll be easy to align them and confine them to simply answering questions and it’ll be easy to have neural nets which just are looking things up in databases and doing transparent symbolic-logical manipulations on the data. So that ‘tool AI’ perspective has not aged well, and makes Anthropic ironic indeed.)
2023 doesn’t look like anyone expected until recently. The current timeline is a surprising place.
he seems to think that neural nets will not tend to have any particular goals or if they do, it’ll be easy to align them and confine them to simply answering questions and it’ll be easy to have neural nets which just are looking things up in databases and doing transparent symbolic-logical manipulations on the data. So that ‘tool AI’ perspective has not aged well, and makes Anthropic ironic indeed.
I mean, isn’t that what we have? It seems to me that, at least relative to what we expect, LLM’s have turned out more human like, more oracle like than we imagined?
Maybe that will change once we use RL to add planning.
LLM’s have turned out more human like, more oracle like than we imagined?
They have turned out far more human-like than Amodei suggested, which means they are not even remotely oracle like. There is nothing in a LLM which is remotely like ‘looking things up in a database and doing transparent symbolic-logical manipulations’. That’s about the last thing that describes humans too—it takes decades of training to get us to LARP as an ‘oracle’, and we still do it badly. Even the stuff LLMs do, like inner-monologue, which seem to be transparent, are actually just more Bayesian meta-RL agentic behavior, where the inner-monologue is a mish-mash of amortized computation and task location where the model is flexibly using the roleplay as hints rather than what everyone seems to think it does, which is turn into a little Turing machine mindlessly executing instructions (hence eg. the ability to distill inner-monologue into the forward pass, or insert errors into few-shot examples or the monologue and still get correct answers).
I see what you mean, but I meant oracle-like in the sense of my recollection of Nick Bostrom’s usage in Superintelligence. E.g. an AI that only answers questions and does not act. In some sense, it’s how much it’s not an agent.
It does seem to me, that pretrained LLM’s are not very agent-like by default. They are by default currently constrained to question answering. Although it’s changing fast with things like toolformer.
Even the stuff LLMs do, like inner-monologue, which seem to be transparent, are actually just more Bayesian meta-RL agentic behavior, where the inner-monologue is a mish-mash of amortized computation and task location where the model is flexibly using the roleplay as hints rather than what everyone seems to think it does, which is turn into a little Turing machine mindlessly executing instructions (hence eg. the ability to distill inner-monologue into the forward pass, or insert errors into few-shot examples or the monologue and still get correct answers).
It kind of sounds like you are saying that they have a lot of agentic capability, but they are hampered by the lack of memory/planning. If your description here is predictive, then it seems there may be a lot of low hanging agentic behaviour that can be unlocked fairly easily. Like many other things with LLM’s, we just need to “ask it properly”. Perhaps using some standard RL techniques like world models.
Do you see the properties/danger of LLM’s changing once we start using RL to make them into proper agents (not just the few-step chat)?
I don’t know what this was a reference to, but amusingly I just noticed that the video I wanted to link was a 2007 lecture at Google by him (if it’s the same Geoffrey Hinton): https://www.youtube.com/watch?v=AyzOUbkUf3M
In it he explained a novel approach to handwriting recognition: stack a bunch of increasingly small layers on top of each other until you have just a few dozen neurons, then an inverted pyramid on top of this bottleneck, and train the network by feeding it a lot of handwritten characters using some sort of modified gradient descent to train it to reproduce the input image in the topmost layer as accurately as possible. After the network is trained, use supervised learning with labeled data to train a usual small NN to interpret/map the bottleneck layer activations to characters. And it worked!
I find it interesting, especially in context of your comment, because:
Unsupervised learning meant that you could feed it a LOT of data.
So you could force it to learn patterns in the data without overfitting.
It appeared uninspired by any natural neuronal organization.
It was a precursor to Deep Dream—you could run the network in reverse and see what it imagines when prompted with a specific digit.
It actually worked! and basically solved handwriting recognition, as far as I understand.
And so it felt like a first qualitative leap in technology in decades, and a very impressive at that, innovating in weird and unexpected ways in several aspects. Sure, it would be another ten years until GPT2, but some promise was definitely there I think.
Partially, but it is still true that Eliezer was critical of NN’s at the time, see the comment on the post:
Eliezer has never denied that neural nets can work (and he provides examples in that linked post of NNs working). Eliezer’s principal objection was that NNs were inscrutable black boxes which would be insanely difficult to make safe enough to entrust humanity-level power to compared to systems designed to be more mathematically tractable from the start. (If I may quip: “The ‘I’, ‘R’, & ‘S’ in the acronym ‘DL’ stand for ‘Interpretable, Reliable, and Safe’.”)
This remains true—for all the good work on NN interpretability, assisted by the surprising levels of linearity inside them, NNs remain inscrutable. To quote Neel Nanda the other day (who has overseen quite a lot of the interpretability research that anyone replying to this comment might be tempted to cite):
What Eliezer (and I, and pretty much every other LWer at the time who spent any time looking at neural nets) got wrong about neural nets, and has admitted as much, is the timing. (Aside from that, Ms Lincoln...)
To expand a bit on the backstory I also discussed in my scaling hypothesis essay: neural nets seemed like they were a colossally long way away. I don’t know how to convey how universal a sentiment this was, or how astonishingly unimpressive neural nets were in 2008 when he was writing that. I was really interested in NNs at that time because the basic argument of ‘humans are neural nets; therefore, neural nets must work for AGI’ is so obviously correct, but even Schmidhuber, hyping his lab’s work to the skies, had nothing better to show than ‘we can win a contest about some simple handwritten digits’. Oh wow. So amazing, much nets, very win. Truly the AI paradigm of the future… the distant future.
Everyone except Shane Legg was wrong about DL prospects & timing, and even Legg was wrong about important things—if you look at his early writings, he’s convinced that DL will take off and reach human-level in the mid-2020s half because of classic Moravec/Kruzweill/Turing-style projections from Moore’s law, yes, but also half because he’s super enthusiastic about all the wonderful neuroscientific discoveries in the mid-2000s which finally show How The Brain Works (For Real This Time*)™. So DeepMind simply needed to surf the compute wave to snap together all the neuroscience & reinforcement learning modules into something like Agent 57, and hey presto—AGI! But most of that DM neuroscience-inspired research is long since forgotten or abandoned, leading-edge DRL archs look nothing like a brain, and the current Transformer architecture owes even less than most to neurobiological inspiration, and it’s unclear how much the Transformer arch matters at all compared to simple scale. DeepMind is now (still) on the hindfoot, and has suffered an ignominious shotgun wedding-merging to Google Brain
(Google Brain itself is now dissolved as penalty for failing the scaling test. Nor is it the only lab to suffer for failing to call scaling—Microsoft Research is increasingly moribund, and FAIR has apparently suffered major changes too. Maybe that’s why LeCun is so shrill on Twitter and adamantly denying that LLMs have any agentic properties whatsoever, nevermind that he’s the cherry-on-top guy… Moravec? Pretty good, but seems to have downplayed the role of training, overestimated robotics progress, and broadly tended to expect too-early dates. Dario Amodei? A relative late-comer who has published little, and while ‘big blob of compute’ aged well, other claims don’t seem to—for example, in 2013, he seems to think that neural nets will not tend to have any particular goals or if they do, it’ll be easy to align them and confine them to simply answering questions and it’ll be easy to have neural nets which just are looking things up in databases and doing transparent symbolic-logical manipulations on the data. So that ‘tool AI’ perspective has not aged well, and makes Anthropic ironic indeed.)
2023 doesn’t look like anyone expected until recently. The current timeline is a surprising place.
* Hinton’s daughter reading this: “Oh dad—not again!”
I mean, isn’t that what we have? It seems to me that, at least relative to what we expect, LLM’s have turned out more human like, more oracle like than we imagined?
Maybe that will change once we use RL to add planning.
They have turned out far more human-like than Amodei suggested, which means they are not even remotely oracle like. There is nothing in a LLM which is remotely like ‘looking things up in a database and doing transparent symbolic-logical manipulations’. That’s about the last thing that describes humans too—it takes decades of training to get us to LARP as an ‘oracle’, and we still do it badly. Even the stuff LLMs do, like inner-monologue, which seem to be transparent, are actually just more Bayesian meta-RL agentic behavior, where the inner-monologue is a mish-mash of amortized computation and task location where the model is flexibly using the roleplay as hints rather than what everyone seems to think it does, which is turn into a little Turing machine mindlessly executing instructions (hence eg. the ability to distill inner-monologue into the forward pass, or insert errors into few-shot examples or the monologue and still get correct answers).
I see what you mean, but I meant oracle-like in the sense of my recollection of Nick Bostrom’s usage in Superintelligence. E.g. an AI that only answers questions and does not act. In some sense, it’s how much it’s not an agent.
It does seem to me, that pretrained LLM’s are not very agent-like by default. They are by default currently constrained to question answering. Although it’s changing fast with things like toolformer.
It kind of sounds like you are saying that they have a lot of agentic capability, but they are hampered by the lack of memory/planning. If your description here is predictive, then it seems there may be a lot of low hanging agentic behaviour that can be unlocked fairly easily. Like many other things with LLM’s, we just need to “ask it properly”. Perhaps using some standard RL techniques like world models.
Do you see the properties/danger of LLM’s changing once we start using RL to make them into proper agents (not just the few-step chat)?
I don’t know what this was a reference to, but amusingly I just noticed that the video I wanted to link was a 2007 lecture at Google by him (if it’s the same Geoffrey Hinton): https://www.youtube.com/watch?v=AyzOUbkUf3M
In it he explained a novel approach to handwriting recognition: stack a bunch of increasingly small layers on top of each other until you have just a few dozen neurons, then an inverted pyramid on top of this bottleneck, and train the network by feeding it a lot of handwritten characters using some sort of modified gradient descent to train it to reproduce the input image in the topmost layer as accurately as possible. After the network is trained, use supervised learning with labeled data to train a usual small NN to interpret/map the bottleneck layer activations to characters. And it worked!
I find it interesting, especially in context of your comment, because:
Unsupervised learning meant that you could feed it a LOT of data.
So you could force it to learn patterns in the data without overfitting.
It appeared uninspired by any natural neuronal organization.
It was a precursor to Deep Dream—you could run the network in reverse and see what it imagines when prompted with a specific digit.
It actually worked! and basically solved handwriting recognition, as far as I understand.
And so it felt like a first qualitative leap in technology in decades, and a very impressive at that, innovating in weird and unexpected ways in several aspects. Sure, it would be another ten years until GPT2, but some promise was definitely there I think.
I guess the joke is not as well-known as I thought: https://twitter.com/pmddomingos/status/632685510201241600 (There’s a better page of Hinton stories somewhere but I can’t immediately refind it.)
Those statist AI doomers never miss a chance to bring I, R, and S into everything...
More seriously, thanks for the history lesson!
Death, taxes, and war, you know—you may not be interested in I, R, or S, but they are interested in you.