A potential big Model Delta in this conversation is between Yudkowsky-2022 and Yudkowsky-2024. From List of Lethalities:
The AI does not think like you do, the AI doesn’t have thoughts built up from the same concepts you use, it is utterly alien on a staggering scale. Nobody knows what the hell GPT-3 is thinking, not only because the matrices are opaque, but because the stuff within that opaque container is, very likely, incredibly alien—nothing that would translate well into comprehensible human thinking, even if we could see past the giant wall of floating-point numbers to what lay behind.
Vs the parent comment:
I think that the AI’s internal ontology is liable to have some noticeable alignments to human ontology w/r/t the purely predictive aspects of the natural world; it wouldn’t surprise me to find distinct thoughts in there about electrons. As the internal ontology goes to be more about affordances and actions, I expect to find increasing disalignment. As the internal ontology takes on any reflective aspects, parts of the representation that mix with facts about the AI’s internals, I expect to find much larger differences—not just that the AI has a different concept boundary around “easy to understand”, say, but that it maybe doesn’t have any such internal notion as “easy to understand” at all, because easiness isn’t in the environment and the AI doesn’t have any such thing as “effort”. Maybe it’s got categories around yieldingness to seven different categories of methods, and/or some general notion of “can predict at all / can’t predict at all”, but no general notion that maps onto human “easy to understand”—though “easy to understand” is plausibly general-enough that I wouldn’t be unsurprised to find a mapping after all.
Yudkowsky is “not particularly happy” with List of Lethalities, and this comment was made a day after the opening post, so neither quote should be considered a perfect expression of Yudkowsky’s belief. In particular the second quote is more epistemically modest, which might be because it is part of a conversation rather than a self-described “individual rant”. Still, the differences are stark. Is the AI utterly, incredibly alien “on a staggering scale”, or does the AI have “noticeable alignments to human ontology”? Are the differences pervasive with “nothing that would translate well”, or does it depend on whether the concepts are “purely predictive”, about “affordances and actions”, or have “reflective aspects”?
The second quote is also less lethal. Human-to-human comparisons seem instructive. A deaf human will have thoughts about electrons, but their internal ontology around affordances and actions will be less aligned. Someone like Eliezer Yudkwosky has the skill of noticing when a concept definition has a step where its boundary depends on your own internals rather than pure facts about the environment, whereas I can’t do that because I project the category boundary onto the environment. Someone with dissociative identities may not have a general notion that maps onto my “myself”. Someone who is enlightened may not have a general notion that maps onto my “I want”. And so forth.
Regardless, different ontologies is still a clear risk factor. The second quote still modestly allows the possibility of a mind so utterly alien that it doesn’t have thoughts about electrons. And there are 42 other lethalities in the list. Security mindset says that risk factors can combine in unexpected ways and kill you.
I’m not sure if this is an update from Yudkowsky-2022 to Yudkowsky-2024. I might expect an update to be flagged as such (eg “I now think that...” instead of “I think that...”). But Yudkowsky said elsewhere that he has made some positive updates. I’m curious if this is one of them.
A potential big Model Delta in this conversation is between Yudkowsky-2022 and Yudkowsky-2024. From List of Lethalities:
Vs the parent comment:
Yudkowsky is “not particularly happy” with List of Lethalities, and this comment was made a day after the opening post, so neither quote should be considered a perfect expression of Yudkowsky’s belief. In particular the second quote is more epistemically modest, which might be because it is part of a conversation rather than a self-described “individual rant”. Still, the differences are stark. Is the AI utterly, incredibly alien “on a staggering scale”, or does the AI have “noticeable alignments to human ontology”? Are the differences pervasive with “nothing that would translate well”, or does it depend on whether the concepts are “purely predictive”, about “affordances and actions”, or have “reflective aspects”?
The second quote is also less lethal. Human-to-human comparisons seem instructive. A deaf human will have thoughts about electrons, but their internal ontology around affordances and actions will be less aligned. Someone like Eliezer Yudkwosky has the skill of noticing when a concept definition has a step where its boundary depends on your own internals rather than pure facts about the environment, whereas I can’t do that because I project the category boundary onto the environment. Someone with dissociative identities may not have a general notion that maps onto my “myself”. Someone who is enlightened may not have a general notion that maps onto my “I want”. And so forth.
Regardless, different ontologies is still a clear risk factor. The second quote still modestly allows the possibility of a mind so utterly alien that it doesn’t have thoughts about electrons. And there are 42 other lethalities in the list. Security mindset says that risk factors can combine in unexpected ways and kill you.
I’m not sure if this is an update from Yudkowsky-2022 to Yudkowsky-2024. I might expect an update to be flagged as such (eg “I now think that...” instead of “I think that...”). But Yudkowsky said elsewhere that he has made some positive updates. I’m curious if this is one of them.