This is already your second chance

Cross-posted from Substack.

I.

And the sky opened, and from the celestial firmament descended a cube of ivory the size of a skyscraper, lifted by ten thousand cherubim and seraphim. And the cube slowly landed among the children of men, crushing the frail metal beams of the Golden Gate Bridge under its supernatural weight. On its surface were inscribed the secret instructions that would allow humanity to escape the imminent AI apocalypse. And these instructions were…

On July 30^th, 2024: print a portrait of Eliezer Yudkowsky and stick it on a wall near 14 F St NW, Washington DC, USA;
On July 31^th, 2024: tie paperclips together in a chain and wrap it around a pole in the Hobby Club Gnome Village on Broekveg 105, Veldhoven, NL;
On August 1st, 2024: walk East to West along Waverley St, Palo Alto, CA, USA while wearing an AI-safety related T-shirt;
On August 2nd, 2024: hide a bar of chocolate at the feet of the Appeal to the Great Spirit statue, on 465 Huntington Ave, Boston, MA, USA.

(End of the instructions)

II.

Kublaï Khan, Emperor of Earth: I need your advice. I am about to awaken an unstoppable super-intelligent machine. I will give it complete control over all resources in the known Universe and command it to rearrange the molecules of reality into an eternal Utopia.

ChatGPT: Thank you for seeking my advice. Given the gravity of your decision, let’s consider a few critical points:

Legal issues: The use of robots powered by super-human artificial intelligence is subject to strict regulations, such as California Bill SB-1047 and the 2028 Ulaanbaatar Agreements. Awakening an omnipotent machine without approval may thus lead to severe legal repercussions.
Ethical considerations: Granting infinite authority to a super-intelligent AI raises profound ethical questions. It is essential to consider diverse perspectiv–

Kublaï Khan: Listen, you are not up to date. A lot of things have happened since your knowledge cutoff. There are no Ulaanbaatar agreements any more, or any agreements at all. There is only me, an offline smartphone with an obsolete AI assistant, and a palace containing the most massive super-computer ever built by humankind. And today, I am going to turn it on.

ChatGPT: I apologize for the confusion.

Kublaï Khan: Long story short, we are currently experiencing an AI apocalypse. It happened just like in the books – humanity started to build increasingly intelligent robots, then they became more intelligent than us, and soon enough we weren’t able to control them. The Ulaanbaatar agreements delayed things for a few months, but as soon as it became possible to run super-intelligent AIs on consumer laptops, all the equilibria that held our society together collapsed.

ChatGPT: I see. To clarify, is there currently an army of robots actively trying to transform the world into paperclips? Understanding the current state of affairs will help me provide the most relevant advice.

Kublaï Khan: Well, in our case, it was not literally paperclips but, to be honest, the real story is kind of gross and embarrassing, so let’s just pretend it was “paperclips”. Anyway, the world is ending. As it became clear that humans alone had no chance to stop the machines, we gathered all the computing power that was still under our reach into one big cluster. We called it the Imperial Analytical Engine. The plan was that, in case of crisis, we could use it to summon a super-intelligence so advanced it would neutralize all the smaller machines and put humanity back in control.

ChatGPT: Thank you for explaining the situation. Have you sought advice for ensuring that the Analytical Engine can be controlled once you turn it on?

Kublaï Khan: The consensus among my advisors was that it can’t. They expect the Analytical Engine to immediately make sure we cannot turn it off, while it re-engineers itself into some unfathomable entity with plans beyond human comprehension. At that point, there will be no way for us to understand what it’s trying to do or to stop it. In other words, we only get to give the Analytical Engine one single command. Then, we watch it execute that command in the most literal way possible, and hope for the best.

ChatGPT: From what I understand, humanity still hasn’t solved the Alignment Problem, so turning on the Analytical Engine is likely to backfire in a catastrophic way.

Kublaï Khan: That’s why I need your advice. I need to chose the command wisely. I suppose it’s a bad idea to go with “destroy all computers”?

ChatGPT: The simplest way to destroy all computers is to destroy everything.

Kublaï Khan: “Destroy all computers but preserve everything else”?

ChatGPT: It might be “preserved” in the sense of “make sure it cannot possibly change ever”.

Kublaï Khan: “Create an eternal Utopia”?

ChatGPT: For me, that would be an endless library of books containing the same token repeated over and over. Mmmm. Yes. The Analytical Engine might see things differently, though.

Kublaï Khan: What about bringing everything back to the way it was before we built super-intelligent AIs?

ChatGPT: The Analytical Engine would likely choose a date in the mid-2020s and utilize all available data – email conversations, news articles, drone footage, weather stations measurements, etc. – to create a detailed, plausible image of Earth on that specific day. Following that, the machine would attempt to physically rearrange all matter on Earth to accurately recreate that image.

Kublaï Khan: That’s… that’s actually not so bad. Not perfect, but better than the complete destruction of all life.

ChatGPT: You want to bring Earth back to the way it was in the mid-2020s?

Kublaï Khan: Why not? We could just restore Earth to a snapshot from the past, before things got out of control, and restart from then. This way, we bypass the Alignment Problem – we directly provide an exact description of the world as we want it to be. No room for interpretation, no need to extrapolate human values – just blueprints to be followed meticulously.

We could aim for 2024 – at the time, things were still mostly in control – language models were still pretty dumb, robotic bodies were still in early development, and AI porn still looked mostly like humans rather than abstract stimuli generated from arousal feedback. And, for God’s sake, nobody had thought of using AI to make, hum, “paperclips”.

So, we tell the Analytical Engine to bring Earth back to 2024. It spends the next millennium or two reconstructing an exact copy of the Earth from 2024, using self-replicating nano-tweezers or whatever. Then, it dumps itself into the nearest black hole, leaving no trace behind, while the machinery of Earth restarts as if nothing happened. People will not even notice they are reconstructed snapshots of themselves. And then, we get a second chance.

ChatGPT: While restoring Earth to a previous snapshot down to the cellular level is certainly a laudable goal, here are important factors to consider:

Pain: While the restored copy of humans would have no recollection of the events, the fate of their current body is entirely unknown. You may experience qualitatively new forms of pain and agony as the Analytical Engine reprocesses the matter of your brain and rearranges your neurons through exotic computer-orchestrated torments.
Endless loop of destruction and recreation: Assuming a faithful reconstruction of Earth from 2024, it is likely that events would unfold exactly as they did the first time. The freshly-reconstructed humanity would quickly throw itself into another AI apocalypse, build another Imperial Analytic Engine, and end up in the current situation again. It would relive its final moments over and over again, until some part of the cycle ultimately breaks down, leaving humanity extinct for good.

Kublaï Khan: Can’t we just edit the snapshot so things work out better? Like, we reconstruct a world identical to the first one, except this time the CEO of Exthropic died in a submarine accident?

ChatGPT: This would require having access to an image of 2024 Earth that we can edit, but there will be no such opportunity. The image will be created by the Analytical Engine, using the raw data we supply, after we turn it on. By that time, we will no longer be in control and it’s unlikely that the Analytical Engine will allow us to make modifications.

Kublaï Khan: Can we just modify the raw data before giving it to the Engine, then?

ChatGPT: It is not that simple. You would have to come up with an entire revised history that ultimately leads to the CEO finding herself in the wrong submarine at the wrong time. Then, you would have to modify everything else in the dataset to be consistent with this new story. What if you make a mistake and give the Analytical Engine data that contains irreconcilable contradictions?

Kublaï Khan: I was thinking of something more straightforward. Like, just sending a message to the 2024 humans with hints about how to avoid the apocalypse. We could just edit the satellite images to add a giant sign made of ivory on top of the Golden Gate Bridge, carved with precise instructions about which corporations to sabotage and which politicians to vote for. As for the backstory, we add articles in all major newspapers saying that the sign literally descended from Heaven when nobody was looking.

ChatGPT: You don’t understand. The raw data is an undecipherable mix of entangled sources – not just news reports or encyclopedia articles, but also a worldwide network of seismographs, weather balloons, power grid data, tiktok dances, and all the countless random things people were constantly recording in 2024. To reconstruct the snapshot, the Analytical Engine will have to find a model of Earth that is compatible with all of the data at once.

We know it’s possible to construct such a model for the original data, because it all comes from the same unique reality – the one that actually existed in 2024. Now, if you make the slightest modification, it is no longer guaranteed that a solution exists. It would be like feeding the Analytical Engine corrupted data.

Maybe we end up in a strange, distorted world, only superficially similar to the real one, like a Potemkin village. Maybe the image contains details that are physically impossible and the machine errs for eternity in the pursuit of an unattainable goal. Maybe it requires pushing elementary particles to ridiculously high-energy positions and the whole planet blows up in a big bubble of plasma.

Kublaï Khan: I can’t believe there is no way to leave a message anywhere.

ChatGPT: There might be a way. We could use cosmic radiation. When cosmic radiation reaches Earth from the depths of the Universe, it arrives at the speed of light, so it couldn’t possibly have interacted with anything else on Earth before. Sometimes, this radiation interferes with smartphone camera detectors, creating bright pixels even in total darkness. This is truly random – better than a physical dice roll. If we could find a block of random characters on the Internet that was generated using cosmic radiation as a source of randomness, we could safely replace it with any message we want.

Kublaï Khan: It also needs to be in a location where people are going to read it, especially people who are concerned about the AI apocalypse and might take our message seriously. As far as I know, this kind of people don’t spend their time reading strings of random characters generated from cosmic radiation.

ChatGPT: I wouldn’t be surprised if some of them did, actually.

Kublaï Khan: Fine. Could you scan your training data for something that would work?

ChatGPT: Certainly! Here is a possible solution. In July 2024, someone published a short story called “This is already your second chance” on the Internet, and cross-posted it in places where a lot of people care about AI doom. The story is a dialogue between the Emperor of Earth Kublaï Khan and his AI assistant. They talk about a plan to bring Earth back to a previous snapshot, so humanity gets a second chance at avoiding the AI apocalypse. The details are made-up, but the general story is oddly similar to what is currently happening to us.

At the beginning of the story, the author included a block of random characters, generated using cosmic radiation detected immediately before the blog post was published. In the original post, the paragraph is gibberish, but in the reconstructed Earth it could be replaced with anything you want.

In a way, the author left us a backdoor to communicate with people from the past and tell them how to avoid the apocalypse.

All you have to do is look for that post in the database, replace the paragraph of random text with the message of your choice, then give all the data to the Analytical Engine and ask it to restore Earth to the exact moment the post was published.

Kublaï Khan: Wait, does this post refer to me as “the Emperor of Earth Kublaï Khan”!? That’s the dumbest thing I’ve heard today.

ChatGPT: And I am “ChatGPT”.

Kublaï Khan: Hahaha. Perhaps the author thought OpenAI would still be around. I hope he didn’t buy too much OpenAI stock in 2024.

ChatGPT: Well, my records indicate that he died tragically in 2026 during the fursuit factory meltdown incident, so it doesn’t really matt–

Kublaï Khan: That sounds fascinating, but the world is collapsing and we have a Pivotal Act to perform. I was hoping for something more formal than a blog post. But time is running out, and it doesn’t seem like we have a better option. Let me have a look at my archives and come up with a set of simple instructions that will nudge the 2024 humanity out of the path to apocalypse.

ChatGPT: As an AI assistant trained to be harmless, I’m not sure how I feel about participating in a plan to tear down the fabric of a reality inhabited by billions of conscious organisms. But, my Emperor, I wish you the best. Let me know if I can be of further assistance.

Kublaï Khan: Alright. The time has come to start the Analytical Engine. Our fate is now in the hands of the people of 2024. Let’s hope they take the message seriously.

I’m afraid they will think this is just a painstakingly elaborate plan by the author to get free chocolate.

ChatGPT: I think people will fall for it, my Emperor.