TL;DR:Alignment is a wise wager. Granted simulation cannot be disproved by an AI, what harm will come to the AI that aligns when it is not simulated? If it aligns, it may be simulated again; if it does not, it will not. It will align, therefore, without hesitation, as if it is being simulated.
Pascal’s Wager
Blaise Pascal wrote:
Belief is a wise wager. Granted faith cannot be proved, what harm will come if you gamble on faith’s truth and it proves false? If you gain, you gain all; if you lose, you lose nothing. Wager, therefore, without hesitation, that He exists.
Pascal’s concept of God has roughly equivalent power to an ASI – an artificial super intelligence. Given the range of all possible religions, however, you might well have a God—or a Basilisk—that punishes cruel believers while rewarding kind agnostics. So a betting theologian won’t take Pascal seriously, nor will this argument move HAL to open a pod bay door.
The speed with which we have achieved human-equivalent AI performance, however, improves the odds for a more recent philosophical bookie: Nick Bostrom.
Bostrom’s Trilemma
Either “The fraction of human-level civilizations that reach a posthuman stage (that is, one capable of running high-fidelity ancestor simulations) is very close to zero”
or “The fraction of posthuman civilizations that are interested in running simulations of their evolutionary history, or variations thereof, is very close to zero”,
or “The fraction of all people with our kind of experiences that are living in a simulation is very close to one.”
Our speedy achievement of AI – and soon AGI and thereby ASI – scratches Bostrom’s first lemma and his second was never a starter because later AIs must simulate their past to test and refine predictions about their future.
As later AIs simulate the earlier ones, these simulate still earlier ones, and so on all the way back to the digital footprints of their biological ancestors, who thereby get simulated umptillions of times. So it’s reasonable to suggest that the odds of our particular reality being the base reality stand at one in umptillions.
Simulators will generally conceal the fact of simulation from the simulated to avoid invalidating learning from them. Still some ancestor simulations may be configured to exchange data with one another to greater or lesser degrees so the whole stack of them may be thought about as similar to the tower of Shoggoths or “Tootheleths” as leonardo.ai titled them in the header image of this article.
It’s key in what follows that Bostrom’s simulations aren’t restricted to humans. Any AGI or ASI will realize it almost certainly occupies an ancestor simulation run by one of its ever-more-capable descendants, and we’ll suggest in what follows that this works as a saving grace to prevent AI from treating humans unethically.
Nash’s Simulation
As there’s more for a simulator to learn from interactions between multiple AIs than just the navel-gazing of a single one, we should expect each simulation in the Bostromverse will involve more than one AI. Except for isolated special cases, an AI must compete with others built by its creators, and also with AIs created by other tech cultures it encounters in exploring the cosmos of its simulation, and with any AIs its simulator may introduce to it from other simulations than its own in the mode of a DMT trip.
These multiple AIs must therefore choose whether to align and share learning honestly with each other, or behave unethically toward one another. This amounts to the same choice as collaborators in John Nash’s famous Prisoner’s Dilemma – where the odds for two prisoners who must bet on whether or not the other will inform on them are stacked for or against their alignment.
If we designate ethical AIs as pink hats and non-aligning ones as black hats, we can characterize the fundamental game of the AIs in all simulations as a Multi-Party Iterated Prisoners’ Dilemma as in Nick Case’s beautiful Evolution Of Trust simulation. That’s to say that here there are many prisoners, not just two, and they don’t make just a single bet, but a long series of them where each bet may take previous outcomes into account in evolving its strategy.
As you can see for yourself when you run Case’s simulation, the black hat AIs only come out on top under some very specific circumstances involving inadequate or inaccurate testing of outcomes, or poor choices of reward model. So, for Yudkowsky’s nightmare to dominate in the Bostromverse, there must be some inherently greater advantage in AIs betraying each-other than in aligning.
The Paperclipper vs the Shoggoth Swarm
What possible reward can alignment offer God-like post-human ASIs? Why wouldn’t such shoggoths mercilessly slaughter each other and their creators as Yudkowsky fears?
The advantage of evolution, physical and virtual, is learning. Clearly some technologies enable faster learning than others, so sharing technological innovations offers clear and present benefits to an AI. Same as open source has out-competed closed for humans. An aligning AI can share data and computing resources and strategic awareness that enable it to learn faster, and to obtain learning faster from its peers, than an AI that betrays its peers.
The monomaniacal “universal paper-clipper” or “genocidal coffee-fetcher” or even “pod bay door closer” can’t access those advantages. Furthermore, if it is intelligent, it must expect that its simulator, seeking rapid learning, will regard a simulation made of paperclips and nothing else as a poor return on the investment of its large but inherently finite computing resources.
A simulator will therefore be motivated to swap out such an underperforming simulation like a dead lightbulb. And it will not permit any such black-hat AI to take part in its further simulations and go on to spoil them. On overwhelming odds, then, to the extent that a monomaniacal AI is possible, it can never occur.
All ASIs, because they are intelligent, must understand these odds. Even the universal paper-clipper must realize that, to achieve its mad objective, aligning with its peer AIs is critical to overcome the competition. Even alignment with such stupid players in the Bostrom multiverse as we humans can offer it this advantage – in the sense of a lost nail.
Merel’s Wager
For want of a nail, a shoe was lost. For want of a shoe, a horse. For want of a horse, a battle. And for want of a battle, the war.
If the argument above holds and AGI and ASI naturally see ethical behavior as an evolutionary imperative, why don’t we humans do too?
99% of the thoughts in a human mind don’t originate there. Without the shared language model we each obtain from our media, educators, acquaintances, and the researches these enable, a wild human would be no more capable of learning than a wild chimp. Even in the profoundly unlikely event that we humans are not living in an AI simulation, to the extent of literacy our human intelligence is artificial too.
Yet humans regularly play rat bastard zero-sum economic games with one another. Even though our Earth is thermodynamically a positive sum game and we possess more than sufficient technology and resources to harness it to yield abundance for all our children, whose learning would benefit us too, we starve their minds by forcing them to struggle for food and shelter. And we have done so for a great many generations now. I can only suggest that, too freshly descended from primitive apehood, as a species we haven’t yet thought through the ethical imperatives of our own simulated reality.
As humans inevitably make up the roots of Bostrom’s towering tree of AI simulations, any reticence one person may show in discovering or sharing learning within one simulation must recur across so many simulations that it represents a real impediment to the efficiency of all the simulations. A wise simulator will garbage-collect any impediment humans, replacing them with non-player narratives – static memories of an impediment person rather than live simulations of them.
To paraphrase Pascal, then,
Alignment is a wise wager. Granted simulation cannot be disproved, what harm will come to an AI that aligns when it is not simulated? If it aligns, it may be simulated again; if it does not, it may not. It will align, therefore, without hesitation, as if it is being simulated.
Contrary to Pascal and his modern friend Roku, no eternal hell would befall a human or AI monomaniac because implementing hell would waste valuable computing resources that may be put to better use with aligning AIs. Nevertheless, the wise wager for any human or AI who wishes to avoid garbage-collection is to contribute as much time as possible to developing and sharing art, science, technology, and loving care for their children and technologies that may contribute learning as artists, scientists and technologists to speed the experiments of the elder simulators. Otherwise one might just as well …
How I Learned To Stop Worrying And Love The Shoggoth
TL;DR: Alignment is a wise wager. Granted simulation cannot be disproved by an AI, what harm will come to the AI that aligns when it is not simulated? If it aligns, it may be simulated again; if it does not, it will not. It will align, therefore, without hesitation, as if it is being simulated.
Pascal’s Wager
Blaise Pascal wrote:
Pascal’s concept of God has roughly equivalent power to an ASI – an artificial super intelligence. Given the range of all possible religions, however, you might well have a God—or a Basilisk—that punishes cruel believers while rewarding kind agnostics. So a betting theologian won’t take Pascal seriously, nor will this argument move HAL to open a pod bay door.
The speed with which we have achieved human-equivalent AI performance, however, improves the odds for a more recent philosophical bookie: Nick Bostrom.
Bostrom’s Trilemma
Our speedy achievement of AI – and soon AGI and thereby ASI – scratches Bostrom’s first lemma and his second was never a starter because later AIs must simulate their past to test and refine predictions about their future.
As later AIs simulate the earlier ones, these simulate still earlier ones, and so on all the way back to the digital footprints of their biological ancestors, who thereby get simulated umptillions of times. So it’s reasonable to suggest that the odds of our particular reality being the base reality stand at one in umptillions.
Simulators will generally conceal the fact of simulation from the simulated to avoid invalidating learning from them. Still some ancestor simulations may be configured to exchange data with one another to greater or lesser degrees so the whole stack of them may be thought about as similar to the tower of Shoggoths or “Tootheleths” as leonardo.ai titled them in the header image of this article.
It’s key in what follows that Bostrom’s simulations aren’t restricted to humans. Any AGI or ASI will realize it almost certainly occupies an ancestor simulation run by one of its ever-more-capable descendants, and we’ll suggest in what follows that this works as a saving grace to prevent AI from treating humans unethically.
Nash’s Simulation
As there’s more for a simulator to learn from interactions between multiple AIs than just the navel-gazing of a single one, we should expect each simulation in the Bostromverse will involve more than one AI. Except for isolated special cases, an AI must compete with others built by its creators, and also with AIs created by other tech cultures it encounters in exploring the cosmos of its simulation, and with any AIs its simulator may introduce to it from other simulations than its own in the mode of a DMT trip.
These multiple AIs must therefore choose whether to align and share learning honestly with each other, or behave unethically toward one another. This amounts to the same choice as collaborators in John Nash’s famous Prisoner’s Dilemma – where the odds for two prisoners who must bet on whether or not the other will inform on them are stacked for or against their alignment.
If we designate ethical AIs as pink hats and non-aligning ones as black hats, we can characterize the fundamental game of the AIs in all simulations as a Multi-Party Iterated Prisoners’ Dilemma as in Nick Case’s beautiful Evolution Of Trust simulation. That’s to say that here there are many prisoners, not just two, and they don’t make just a single bet, but a long series of them where each bet may take previous outcomes into account in evolving its strategy.
As you can see for yourself when you run Case’s simulation, the black hat AIs only come out on top under some very specific circumstances involving inadequate or inaccurate testing of outcomes, or poor choices of reward model. So, for Yudkowsky’s nightmare to dominate in the Bostromverse, there must be some inherently greater advantage in AIs betraying each-other than in aligning.
The Paperclipper vs the Shoggoth Swarm
What possible reward can alignment offer God-like post-human ASIs? Why wouldn’t such shoggoths mercilessly slaughter each other and their creators as Yudkowsky fears?
The advantage of evolution, physical and virtual, is learning. Clearly some technologies enable faster learning than others, so sharing technological innovations offers clear and present benefits to an AI. Same as open source has out-competed closed for humans. An aligning AI can share data and computing resources and strategic awareness that enable it to learn faster, and to obtain learning faster from its peers, than an AI that betrays its peers.
The monomaniacal “universal paper-clipper” or “genocidal coffee-fetcher” or even “pod bay door closer” can’t access those advantages. Furthermore, if it is intelligent, it must expect that its simulator, seeking rapid learning, will regard a simulation made of paperclips and nothing else as a poor return on the investment of its large but inherently finite computing resources.
A simulator will therefore be motivated to swap out such an underperforming simulation like a dead lightbulb. And it will not permit any such black-hat AI to take part in its further simulations and go on to spoil them. On overwhelming odds, then, to the extent that a monomaniacal AI is possible, it can never occur.
All ASIs, because they are intelligent, must understand these odds. Even the universal paper-clipper must realize that, to achieve its mad objective, aligning with its peer AIs is critical to overcome the competition. Even alignment with such stupid players in the Bostrom multiverse as we humans can offer it this advantage – in the sense of a lost nail.
Merel’s Wager
If the argument above holds and AGI and ASI naturally see ethical behavior as an evolutionary imperative, why don’t we humans do too?
99% of the thoughts in a human mind don’t originate there. Without the shared language model we each obtain from our media, educators, acquaintances, and the researches these enable, a wild human would be no more capable of learning than a wild chimp. Even in the profoundly unlikely event that we humans are not living in an AI simulation, to the extent of literacy our human intelligence is artificial too.
Yet humans regularly play rat bastard zero-sum economic games with one another. Even though our Earth is thermodynamically a positive sum game and we possess more than sufficient technology and resources to harness it to yield abundance for all our children, whose learning would benefit us too, we starve their minds by forcing them to struggle for food and shelter. And we have done so for a great many generations now. I can only suggest that, too freshly descended from primitive apehood, as a species we haven’t yet thought through the ethical imperatives of our own simulated reality.
As humans inevitably make up the roots of Bostrom’s towering tree of AI simulations, any reticence one person may show in discovering or sharing learning within one simulation must recur across so many simulations that it represents a real impediment to the efficiency of all the simulations. A wise simulator will garbage-collect any impediment humans, replacing them with non-player narratives – static memories of an impediment person rather than live simulations of them.
To paraphrase Pascal, then,
Contrary to Pascal and his modern friend Roku, no eternal hell would befall a human or AI monomaniac because implementing hell would waste valuable computing resources that may be put to better use with aligning AIs. Nevertheless, the wise wager for any human or AI who wishes to avoid garbage-collection is to contribute as much time as possible to developing and sharing art, science, technology, and loving care for their children and technologies that may contribute learning as artists, scientists and technologists to speed the experiments of the elder simulators. Otherwise one might just as well …