TurnTrout comments on AGI Ruin: A List of Lethalities

TurnTrout 1 Jan 2024 20:33 UTC
LW: 53 AF: 22
28
AF
Reading this post made me more optimistic about alignment and AI. My suspension of disbelief snapped; I realized how vague and bad a lot of these “classic” alignment arguments are, and how many of them are secretly vague analogies and intuitions about evolution.
While I agree with a few points on this list, I think this list is fundamentally misguided. The list is written in a language which assigns short encodings to confused and incorrect ideas. I think a person who tries to deeply internalize this post’s worldview will end up more confused about alignment and AI, and urge new researchers to not spend too much time trying to internalize this post’s ideas. (Definitely consider whether I am right in my claims here. Think for yourself. If you don’t know how to think for yourself, I wrote about exactly how to do it! But my guess is that deeply engaging with this post is, at best, a waste of time.^[1])
I think this piece is not “overconfident”, because “overconfident” suggests that Lethalities is simply assigning extreme credences to reasonable questions (like “is deceptive alignment the default?”). Rather, I think both its predictions and questions are not reasonable because they are not located by good evidence or arguments. (Example: I think that deceptive alignment is only supported by flimsy arguments.)
I personally think Eliezer’s alignment worldview (as I understand it!) appears to exist in an alternative reality derived from unjustified background assumptions.^[2] Given those assumptions, then sure, Eliezer’s reasoning steps are probably locally valid. But I think that in reality, most of this worldview ends up irrelevant and misleading because the background assumptions don’t hold.
I think this kind of worldview (socially attempts to) shield itself from falsification by e.g. claiming that modern systems “don’t count” for various reasons which I consider flimsy. But I think that deep learning experiments provide plenty of evidence on alignment questions.
But, hey, why not still include this piece in the review? I think it’s interesting to know what a particular influential person thought at a given point in time.
1. ^
  Related writing of mine: Some of my disagreements with List of Lethalities, Inner and outer alignment decompose one hard problem into two extremely hard problems.
  Recommended further critiques of this worldview: Evolution is a bad analogy for AGI: inner alignment, Evolution provides no evidence for the sharp left turn, My Objections to “We’re All Gonna Die with Eliezer Yudkowsky”.
2. ^
  Since Eliezer claims to have figured out so many ideas in the 2000s, his assumptions presumably were locked in before the advent of deep learning. This constitutes a “bottom line.”
What links here?
- lc 18 Jan 2024 19:30 UTC
  4 points
  3
  Parent
  
  Since Eliezer claims to have figured out so many ideas in the 2000s, his assumptions presumably were locked in before the advent of deep learning. This constitutes a “bottom line.”
  
  I mean it’s worth considering that his P(DOOM) was substantially lower then. He’s definitely updated on existing evidence, just in the opposite direction that you have.