This post consists of comments on summaries of a debate about the nature and difficulty of the alignment problem. The original debate was between Eliezer Yudkowsky and Richard Ngo but this post does not contain the content from that debate. This posts is mostly of commentary by Jaan Tallinn on that debate, with comments by Eliezer.
The post provides a kind of fascinating level of insight into true insider conversations about AI alignment. How do Eliezer and Jaan converse about alignment? Sure, this is a public setting, so perhaps they communicate differently in private. But still. Read the post and you kind of see the social dynamics between them. It’s fascinating, actually.
Eliezer is just incredibly doom-y. He describes in fantastic detail the specific ways that a treacherous turn might play out, over dozens of paragraphs, 3 levels deep in a one-on-one conversation, in a document that merely summarizes a prior debate on the topic. He uses Capitalized Terms to indicate that things like “Doomed Phase” and “Terminal Phase” and “Law of Surprisingly Undignified Failure” are not merely for one time use but in fact refer to specific nodes in a larger conceptual framework.
One thing that happens often is that Jaan asks a question, Eliezer gives an extensive reply, and then Jaan response that, no, he was actually asking a different question.
There is one point where Jaan describes his frustration over the years with mainstream AI researchers objecting to AI safety arguments as being invalid due to anthropomorphization, when in fact the arguments were not invalidly anthropomorphizing. There is a kind of gentle vulnerability in this section that is worth reading seriously.
There is a lot of swapping of models of others in and outside the debate. Everyone is trying to model everyone all the time.
Eliezer does unfortunately like to explicitly underscore his own brilliance. He says things like:
I consider all of this obvious as a convergent instrumental strategy for AIs. I could probably have generated it in 2005 or 2010 [...]
But it’s clear enough that probably nobody was ever going to pass the validation set for generating lines of reasoning obvious enough to be generated by Eliezer in 2010 or possibly 2005
I do think that the content itself really comes down to the same basic question tackled in the original Hanson/Yudkowsky FOOM debate. I understand that this debate was ostensibly a broader question than FOOM. In practice I don’t think this discourse has actually moved on much since 2008.
The main thing the FOOM debate is missing, in my opinion, is this: we have almost no examples of AI systems that can do meaningful sophisticated things in the physical world. Self-driving cars still aren’t a reality. Walk around a city or visit an airport or drive down a highway, and you see shockingly few robots, and certainly no robots pursuing even the remotest kind of general-purpose tasks. Demo videos of robots doing amazing, scary, general-purpose things abound, but where are these robots in the real world? They are always just around the corner. Why?
The main thing the FOOM debate is missing, in my opinion, is this: we have almost no examples of AI systems that can do meaningful sophisticated things in the physical world. Self-driving cars still aren’t a reality.
I think I disagree with this characterization. A) we totally have robot cars by now, B) I think mostly what we don’t have are AI running systems where the consequence of failure is super high (which maybe happens to be more true for the physical world, but I’d expect to also be true for critical systems in the digital world)
RE the FOOM debate: On this, I think the Hansonian viewpoint that takeoff would be gradual was way more correct than the discontinuous narrative of Eliezer, where AI progress in the real world follows more of a Hansonian path.
Eliezer didn’t get this totally wrong, and there are some results in AI showing that there can be phase transitions/discontinuities. But overall, a good prior for AI progress is that it will look like the Hansonian continuous progress rather than the FOOM of Eliezer.
This post consists of comments on summaries of a debate about the nature and difficulty of the alignment problem. The original debate was between Eliezer Yudkowsky and Richard Ngo but this post does not contain the content from that debate. This posts is mostly of commentary by Jaan Tallinn on that debate, with comments by Eliezer.
The post provides a kind of fascinating level of insight into true insider conversations about AI alignment. How do Eliezer and Jaan converse about alignment? Sure, this is a public setting, so perhaps they communicate differently in private. But still. Read the post and you kind of see the social dynamics between them. It’s fascinating, actually.
Eliezer is just incredibly doom-y. He describes in fantastic detail the specific ways that a treacherous turn might play out, over dozens of paragraphs, 3 levels deep in a one-on-one conversation, in a document that merely summarizes a prior debate on the topic. He uses Capitalized Terms to indicate that things like “Doomed Phase” and “Terminal Phase” and “Law of Surprisingly Undignified Failure” are not merely for one time use but in fact refer to specific nodes in a larger conceptual framework.
One thing that happens often is that Jaan asks a question, Eliezer gives an extensive reply, and then Jaan response that, no, he was actually asking a different question.
There is one point where Jaan describes his frustration over the years with mainstream AI researchers objecting to AI safety arguments as being invalid due to anthropomorphization, when in fact the arguments were not invalidly anthropomorphizing. There is a kind of gentle vulnerability in this section that is worth reading seriously.
There is a lot of swapping of models of others in and outside the debate. Everyone is trying to model everyone all the time.
Eliezer does unfortunately like to explicitly underscore his own brilliance. He says things like:
I do think that the content itself really comes down to the same basic question tackled in the original Hanson/Yudkowsky FOOM debate. I understand that this debate was ostensibly a broader question than FOOM. In practice I don’t think this discourse has actually moved on much since 2008.
The main thing the FOOM debate is missing, in my opinion, is this: we have almost no examples of AI systems that can do meaningful sophisticated things in the physical world. Self-driving cars still aren’t a reality. Walk around a city or visit an airport or drive down a highway, and you see shockingly few robots, and certainly no robots pursuing even the remotest kind of general-purpose tasks. Demo videos of robots doing amazing, scary, general-purpose things abound, but where are these robots in the real world? They are always just around the corner. Why?
I think I disagree with this characterization. A) we totally have robot cars by now, B) I think mostly what we don’t have are AI running systems where the consequence of failure is super high (which maybe happens to be more true for the physical world, but I’d expect to also be true for critical systems in the digital world)
Have you personally ever ridden in a robot car that has no safety driver?
RE the FOOM debate: On this, I think the Hansonian viewpoint that takeoff would be gradual was way more correct than the discontinuous narrative of Eliezer, where AI progress in the real world follows more of a Hansonian path.
Eliezer didn’t get this totally wrong, and there are some results in AI showing that there can be phase transitions/discontinuities. But overall, a good prior for AI progress is that it will look like the Hansonian continuous progress rather than the FOOM of Eliezer.