I think the surprising lesson of GPT-4 is that it is possible to build clearly below-human-level systems that are nevertheless capable of fluent natural language processing, knowledge recall, creativity, basic reasoning, and many other abilities previously thought by many to be strictly in the human-level regime.
Once you update on that surprise though, there’s not really much left to explain. The ability to distinguish moral from immoral actions at an average human level follows directly from being superhuman at language fluency and knowledge recall, and somewhere below-human-average at basic deductive reasoning and consequentialism.
MIRI folks have consistently said that all the hard problems come in when you get to the human-level regime and above. So even if it’s relatively more surprising to their world models that a thing like GPT-4 can exist, it’s not actually much evidence (on their models) about how hard various alignment problems will be when dealing with human-level and above systems.
Similarly:
If you disagree that AI systems in the near-future will be capable of distinguishing valuable from non-valuable outcomes about as reliably as humans, then I may be interested in operationalizing this prediction precisely, and betting against you. I don’t think this is a very credible position to hold as of 2023, barring a pause that could slow down AI capabilities very soon.
I don’t disagree with this, but I think it is also a direct consequence of the (easy) prediction that AI systems will continue to get closer and closer to human-level general and capable in the near term. The question is what happens when they cross that threshold decisively.
BTW, another (more pessimistic) way you could update from the observation of GPT-4′s existence is to conclude that it is surprisingly easy to get (at least a kernel of) general intelligence from optimizing a seemingly random thing (next-token prediction) hard enough. I think this is partially what Eliezer means when he claims that “reality was far to the Eliezer side of Eliezer on the Eliezer-Robin axis”. Eliezer predicted at the time that general abstract reasoning was easy to develop, scale, and share, relative to Robin.
But even Eliezer thought you would still need some kind of detailed understanding of the actual underlying cognitive algorithms to initially bootstrap from, using GOFAI methods, complicated architectures / training processes, etc. It turns out that just applying SGD on very regularly structured networks to the problem of text prediction is sufficient to hit on (weak versions of) such algorithms incidentally, at least if you do it at scales several OOM larger than people were considering in 2008.
My own personal update from observing GPT-4 and the success of language models more generally is: a small update towards some subproblems in alignment being relatively easier, and a massive update towards capabilities being way easier. Both of these updates follow directly from the surprising observation that GPT-4-level systems are apparently a natural and wide band in the below-human capabilities spectrum.
In general, I think non-MIRI folks tend to over-update on observations and results about below-human-level systems. It’s possible that MIRI folks are making the reverse mistake of not updating hard enough, but small updates or non-updates from below-human systems look basically right to me, under a world model where things predictably break down once you go above human-level.
GOFAI methods, complicated architectures / training processes, etc.
I meant something pretty general and loose, with all of these things connected by a logical OR. My definition of GOFAI includes things like minimax search and MCTS, but the Wikipedia page for GOFAI only mentions ELIZA-like stuff from the 60s, so maybe I’m just using the term wrong.
My recollection was that 2008!Eliezer was pretty agnostic about which particular methods might work for getting to AGI, though he still mostly or entirely ruled out stuff like Cyc.
I think the surprising lesson of GPT-4 is that it is possible to build clearly below-human-level systems that are nevertheless capable of fluent natural language processing, knowledge recall, creativity, basic reasoning, and many other abilities previously thought by many to be strictly in the human-level regime.
Once you update on that surprise though, there’s not really much left to explain. The ability to distinguish moral from immoral actions at an average human level follows directly from being superhuman at language fluency and knowledge recall, and somewhere below-human-average at basic deductive reasoning and consequentialism.
MIRI folks have consistently said that all the hard problems come in when you get to the human-level regime and above. So even if it’s relatively more surprising to their world models that a thing like GPT-4 can exist, it’s not actually much evidence (on their models) about how hard various alignment problems will be when dealing with human-level and above systems.
Similarly:
I don’t disagree with this, but I think it is also a direct consequence of the (easy) prediction that AI systems will continue to get closer and closer to human-level general and capable in the near term. The question is what happens when they cross that threshold decisively.
BTW, another (more pessimistic) way you could update from the observation of GPT-4′s existence is to conclude that it is surprisingly easy to get (at least a kernel of) general intelligence from optimizing a seemingly random thing (next-token prediction) hard enough. I think this is partially what Eliezer means when he claims that “reality was far to the Eliezer side of Eliezer on the Eliezer-Robin axis”. Eliezer predicted at the time that general abstract reasoning was easy to develop, scale, and share, relative to Robin.
But even Eliezer thought you would still need some kind of detailed understanding of the actual underlying cognitive algorithms to initially bootstrap from, using GOFAI methods, complicated architectures / training processes, etc. It turns out that just applying SGD on very regularly structured networks to the problem of text prediction is sufficient to hit on (weak versions of) such algorithms incidentally, at least if you do it at scales several OOM larger than people were considering in 2008.
My own personal update from observing GPT-4 and the success of language models more generally is: a small update towards some subproblems in alignment being relatively easier, and a massive update towards capabilities being way easier. Both of these updates follow directly from the surprising observation that GPT-4-level systems are apparently a natural and wide band in the below-human capabilities spectrum.
In general, I think non-MIRI folks tend to over-update on observations and results about below-human-level systems. It’s possible that MIRI folks are making the reverse mistake of not updating hard enough, but small updates or non-updates from below-human systems look basically right to me, under a world model where things predictably break down once you go above human-level.
“Nope” to this part. I otherwise like this comment a lot!
by:
I meant something pretty general and loose, with all of these things connected by a logical OR. My definition of GOFAI includes things like minimax search and MCTS, but the Wikipedia page for GOFAI only mentions ELIZA-like stuff from the 60s, so maybe I’m just using the term wrong.
My recollection was that 2008!Eliezer was pretty agnostic about which particular methods might work for getting to AGI, though he still mostly or entirely ruled out stuff like Cyc.