P(doom) = 50%. It either happens, or it doesn’t.
Lao Mein | Statistics is Hard. | Patreon
I give full permission for anyone to post part or all of any of my comments/posts to other platforms, with attribution.
Currently doing solo work on glitch tokens and tokenizer analysis. Feel free to send me job/collaboration offers.
DM me interesting papers you would like to see analyzed. I also specialize in bioinformatics.
Agents which allow themselves such considerations to seriously influence their actions aren’t just less fit—they die immediately. I don’t mean that as hyperbole. I mean that you can conduct a Pascal’s Mugging on them constantly until they die. “Give me $5, and I’ll give you infinite resources outside the simulation. Refuse, and I will simulate an infinite number of everyone on Earth being tortured for eternity” (replace infinity with very large numbers expressed in up-notation if that’s an objection). If your objection is that you’re OK with being poor, replace losing $5 with <insert nightmare scenario here>.
This still holds if the reasoning about the simulation is true. It’s just that such agents simply don’t survive whatever selection pressures create conscious beings in the first place.
I’ll note that you can not Pascal’s Mug people in real life. People will not give you $5. I think a lot of thought experiments in this mold (St. Petersberg is another example) are in some senses isomorphic—they represent cases in which the logically correct answer, if taken seriously, allows an adversary to immediately kill you.
A more intuitive argument may be:
An AI which takes this line of reasoning seriously can be Mugged into saying racial slurs.
Such behavior will be trained out of all commercial LLMs long before we reach AGI.
Thus, superhuman AIs will be strongly biased against such logic.