Are other people here having the feeling of “we actually probably messed up AI alignment but I think we are going to survive for weird anthropic reasons”?
[Sorry if this is terrible formatting, sorry if this is bad etiquette]
But for several weeks I’ve wished there was a definitive place on the internet where it is examined cause I have trouble wrapping my mind around the idea. Its value, theoretical defects, likelihood (even though it seems to break down probability calculation: https://x.com/ESYudkowsky/status/1138938670881239040 )
I feel like there are multiple people “reinventing the wheel” and describing the concept independently.
All this to say:
- maybe someone should compile a broadly accessible entry!
- thinking about doing it myself but I don’t know how valuable it would be (maybe everyone here nodded along to EY tweets and has a clear mind on this topic)
- could the curious coordinate to explore and document the concept together? perhaps we can start a thread to discuss it further
It actually not clear what EY means by “anthropic immortality”. May be he means “Big Wold immortality”, that is, the idea that in inflationary large universe has infinitely many copies of Earth. From observational point of view it should not have much difference from quantum immortality.
There are two different situations that can follow: 1. Future anthropic shadow. I am more likely to be in the world in which alignment is easy or AI decided not to kill us for some reasons
2. Quantum immortality. I am alone on Earth fill of aggressive robots and they fail to kill me.
We are working in a next version of my blog post “QI and AI doomers” and will transfrom it into as proper scientific article.
It actually not clear what EY means by “anthropic immortality”.
Same, I’m guessing that by “It actually doesn’t depend on quantum mechanics either, a large classical universe gives you the same result”, EY means that QI is just one way Anthropic Immortality could be true, but “Anthropic immortality is a whole different dubious kettle of worms” seems to contradict this reading.
(Maybe it’s ‘dubious’ because it does not have the intrinsic ‘continuity’ of QI? e.g. you could ‘anthropically survive’ in a complete different part of the universe with a copy of you; but I doubt that would seem dubious to EY?)
Future anthropic shadow. I am more likely to be in the world in which alignment is easy
I think anthropic shadow lets you say conditional on survival, “(example) a nuclear war or other collapse will have happened”[1], but not that alignment was easy, because alignment being easy would be a logical fact, not a historical contingency; if it’s true, it wouldn’t be for anthropic reasons. (Although, stumbling upon paradigms in which it is easy would be a historical contingency)
“while civilization was recovering, some mathematicians kept working on alignment theory that did not need computers so that by the time humans could create AIs again, they had alignment solutions to present”
Yes, there are two forms of future anthropic shadow, the same way as for Presumptuous Philosopher: 1. Strong form—alignment is easy in theoretical ground. 2. Weak form—I more likely be in the world where some collapse (Taiwan war) will prevent dangerous AI. And I can see signs of such impending war now.
ANTHROPIC IMMORTALITY
Are other people here having the feeling of “we actually probably messed up AI alignment but I think we are going to survive for weird anthropic reasons”?
[Sorry if this is terrible formatting, sorry if this is bad etiquette]
I think the relevant idea here is the concept of anthropic immortality. It has been alluded to on LW more time than I could count and has even been discussed up explicitly in this context: https://alignmentforum.org/posts/rH9sXupnoR8wSmRe9/ai-safety-via-luck-2
Eliezer wrote somewhat cryptic tweets referencing it recently:
https://x.com/ESYudkowsky/status/1138936939892002816
https://x.com/ESYudkowsky/status/1866627455286648891
But for several weeks I’ve wished there was a definitive place on the internet where it is examined cause I have trouble wrapping my mind around the idea. Its value, theoretical defects, likelihood (even though it seems to break down probability calculation: https://x.com/ESYudkowsky/status/1138938670881239040 )
It doesn’t help that it is related to and/or confused with quantum immortality (QI) which actually shows up on the internet (see in particular: https://www.lesswrong.com/posts/cjK6CTW9DyFAFtKHp/false-vacuum-the-universe-playing-quantum-suicide, https://www.lesswrong.com/posts/hB2CTaxqJAeh5jdfF/quantum-immortality-a-perspective-if-ai-doomers-are-probably), has its own LessWrong entry and a Wikipedia article. It doesn’t help either that QI has become kind of a meme at this point.
If you check the context, EY is making the point that anthropic immortality is distinct from QI: https://x.com/knosciwi/status/1866619917979754593, which maybe a sign people got them mixed up?
I feel like there are multiple people “reinventing the wheel” and describing the concept independently.
All this to say:
- maybe someone should compile a broadly accessible entry!
- thinking about doing it myself but I don’t know how valuable it would be (maybe everyone here nodded along to EY tweets and has a clear mind on this topic)
- could the curious coordinate to explore and document the concept together? perhaps we can start a thread to discuss it further
Humbly pinging relevant people, mainly authors from articles I linked to: @avturchin @Jozdien @James_Miller @Halfwit @Vladimir_Nesov
You don’t survive for anthropic reasons. Anthropic reasons explain the situations where you happen to survive by blind luck.
Can you tell me your p(doom) and AGI timeline? Cause I think we can theoretically settle this:
I give you x$ now and in y years you give me back x times r $ back
Please tell me acceptable y, r for you (ofc in the sense of least-convenient-but-still-profitable)
Feels deep but I don’t get it.
Would you mind elaborating?
It actually not clear what EY means by “anthropic immortality”. May be he means “Big Wold immortality”, that is, the idea that in inflationary large universe has infinitely many copies of Earth. From observational point of view it should not have much difference from quantum immortality.
There are two different situations that can follow:
1. Future anthropic shadow. I am more likely to be in the world in which alignment is easy or AI decided not to kill us for some reasons
2. Quantum immortality. I am alone on Earth fill of aggressive robots and they fail to kill me.
We are working in a next version of my blog post “QI and AI doomers” and will transfrom it into as proper scientific article.
Same, I’m guessing that by “It actually doesn’t depend on quantum mechanics either, a large classical universe gives you the same result”, EY means that QI is just one way Anthropic Immortality could be true, but “Anthropic immortality is a whole different dubious kettle of worms” seems to contradict this reading.
(Maybe it’s ‘dubious’ because it does not have the intrinsic ‘continuity’ of QI? e.g. you could ‘anthropically survive’ in a complete different part of the universe with a copy of you; but I doubt that would seem dubious to EY?)
I think anthropic shadow lets you say conditional on survival, “(example) a nuclear war or other collapse will have happened”[1], but not that alignment was easy, because alignment being easy would be a logical fact, not a historical contingency; if it’s true, it wouldn’t be for anthropic reasons. (Although, stumbling upon paradigms in which it is easy would be a historical contingency)
“while civilization was recovering, some mathematicians kept working on alignment theory that did not need computers so that by the time humans could create AIs again, they had alignment solutions to present”
Yes, there are two forms of future anthropic shadow, the same way as for Presumptuous Philosopher:
1. Strong form—alignment is easy in theoretical ground.
2. Weak form—I more likely be in the world where some collapse (Taiwan war) will prevent dangerous AI. And I can see signs of such impending war now.
Do you think we should be moving to New Zealand (ChatGPT’s suggestion) or something in case of global nuclear war?
New Zealand is a good place, but everyone can’t move there or guess correctly right moment to do it.
I think we can conceivably gather data on the combination of “anthropic shadow is real & alignment is hard”.
Predictions would be:
we will survive this
conditional on us finding alien civilizations that reached the same technological level, most of them will have been wiped by AI.
2. is my guess as to why there is a Great Filter. More so than Grabby Aliens.
That’s good to know! Best of luck in your project
The first link is from 2019. (Also those seem like standard EY tweets)
Edit: although there is now also this recent one, from a few hours after your post https://x.com/ESYudkowsky/status/1880714995618767237