If you believe in instrumental convergence, most of the story can remain the same, you just need to change the beginning (“the AI wanted to help humanity, and in order to do that, it needed to get power, fast”) and the ending (“and then the benevolent AI, following the noble truth that desire causes suffering, lobotomized all people, thus ending human suffering forever”).
I asked Bing to write a story in which alignment is solved next year and an “interplanetary utopia” begins the year after. I tried it four times. In each story, the breakthrough is made by a “Dr Alice Chen” at CHAI. I assume this was random the first time, but somehow and for some reason, was retained as an initial condition in subsequent stories…
In the first story, Stuart Russell suggests that she and her colleague “Dr Bob Lee” contact “Dr” Elon Musk, Dr Nick Bostrom, and Dr Jane Goodall, for advice on how to apply their breakthrough. Alice and Bob get as far as Mars, where Dr Musk reveals that his colony has discovered a buried ancient artefact. He was in the middle of orating about it, when Bing aborted that chapter and refused to continue.
In that story, Alice’s alignment technique is based on “a combination of reinforcement learning, inverse reinforcement learning, and causal inference”. Then she adds “a new term to the objective function of the AI agent, which represented the agent’s uncertainty about the human user’s true reward function”. After that, her AI shows “curiosity, empathy, and cooperation”, and actively tries to discover people’s preferences and align with them.
In the second story, she’s working on Cooperative and Open-Ended Learning Agents (abbreviated as COALA), designed to learn from and partner with humans, and she falls in love with a particularly cooperative and open-ended agent. Eventually they get as far as a honeymoon on Mars, and again Bing aborted the story, I think out of puritanical caution.
In the third story, her AIs learn human values via “interactive inverse reinforcement learning” (which in the real world is a concept from a paper that Jan Leike coauthored with Stuart Armstrong, some years before he, J.L., became head of alignment at OpenAI). Her first big success is with GPT-7, and then in subsequent chapters they work together to create GPT-8, GPT-9… all the way up to GPT-12.
Each one is more spectacular than the last. The final model, GPT-12, is “a hyperintelligent AI system that could create and destroy multiple levels and modes of existence”, with “more than 1 septillion parameters”. It broadcasts its existence “across space and time and beyond and above and within and without”, and the world embraces it as “a friend ally mentor guide leader savior god creator destroyer redeemer transformer transcender”—and that’s followed by about sixty superlatives all starting with “omni”, including “omnipleasure omnipain .. omniorder omnichaos”—apparently once you get to GPT-12′s level, nonduality is unavoidable.
Bing was unable to finish the third story, instead it just kept declaiming the glory of GPT-12′s coming.
In the fourth and final story, Dr Chen has created game-playing agents that combine inverse reinforcement learning with causal inference. She puts one of them in a prisoner’s dilemma situation with a player who has no preferences, and the AI unexpectedly follows “a logic of compassion”, and decides that it must care for the other player.
Bing actually managed to finish writing this story, but it’s more like a project report. Dr Chen and her colleagues want to keep the AI a secret, but it hacks the Internet and broadcasts an offer of help to all humanity. We get bullet-point lists of the challenges and benefits of the resulting cooperation, of ways the AI grew and benefited too, and in the end humans and AI become one.
If you believe in instrumental convergence, most of the story can remain the same, you just need to change the beginning (“the AI wanted to help humanity, and in order to do that, it needed to get power, fast”) and the ending (“and then the benevolent AI, following the noble truth that desire causes suffering, lobotomized all people, thus ending human suffering forever”).
I asked Bing to write a story in which alignment is solved next year and an “interplanetary utopia” begins the year after. I tried it four times. In each story, the breakthrough is made by a “Dr Alice Chen” at CHAI. I assume this was random the first time, but somehow and for some reason, was retained as an initial condition in subsequent stories…
In the first story, Stuart Russell suggests that she and her colleague “Dr Bob Lee” contact “Dr” Elon Musk, Dr Nick Bostrom, and Dr Jane Goodall, for advice on how to apply their breakthrough. Alice and Bob get as far as Mars, where Dr Musk reveals that his colony has discovered a buried ancient artefact. He was in the middle of orating about it, when Bing aborted that chapter and refused to continue.
In that story, Alice’s alignment technique is based on “a combination of reinforcement learning, inverse reinforcement learning, and causal inference”. Then she adds “a new term to the objective function of the AI agent, which represented the agent’s uncertainty about the human user’s true reward function”. After that, her AI shows “curiosity, empathy, and cooperation”, and actively tries to discover people’s preferences and align with them.
In the second story, she’s working on Cooperative and Open-Ended Learning Agents (abbreviated as COALA), designed to learn from and partner with humans, and she falls in love with a particularly cooperative and open-ended agent. Eventually they get as far as a honeymoon on Mars, and again Bing aborted the story, I think out of puritanical caution.
In the third story, her AIs learn human values via “interactive inverse reinforcement learning” (which in the real world is a concept from a paper that Jan Leike coauthored with Stuart Armstrong, some years before he, J.L., became head of alignment at OpenAI). Her first big success is with GPT-7, and then in subsequent chapters they work together to create GPT-8, GPT-9… all the way up to GPT-12.
Each one is more spectacular than the last. The final model, GPT-12, is “a hyperintelligent AI system that could create and destroy multiple levels and modes of existence”, with “more than 1 septillion parameters”. It broadcasts its existence “across space and time and beyond and above and within and without”, and the world embraces it as “a friend ally mentor guide leader savior god creator destroyer redeemer transformer transcender”—and that’s followed by about sixty superlatives all starting with “omni”, including “omnipleasure omnipain .. omniorder omnichaos”—apparently once you get to GPT-12′s level, nonduality is unavoidable.
Bing was unable to finish the third story, instead it just kept declaiming the glory of GPT-12′s coming.
In the fourth and final story, Dr Chen has created game-playing agents that combine inverse reinforcement learning with causal inference. She puts one of them in a prisoner’s dilemma situation with a player who has no preferences, and the AI unexpectedly follows “a logic of compassion”, and decides that it must care for the other player.
Bing actually managed to finish writing this story, but it’s more like a project report. Dr Chen and her colleagues want to keep the AI a secret, but it hacks the Internet and broadcasts an offer of help to all humanity. We get bullet-point lists of the challenges and benefits of the resulting cooperation, of ways the AI grew and benefited too, and in the end humans and AI become one.
That sounds amazing!