TLDR: Through concrete scenic descriptions illustrate how I expect naive goal specifications to fail, for getting diamonds in Minecraft.
Not much beyond the concrete examples is original. Also checkout the excellent Specification Gaming video by Rational Animations.
The Setup
I am playing Minecraft. I’d like to have an AI companion that can perform all sorts of tasks, like obtaining diamonds and giving them to me. The AI controls a normal player character with the usual controls. Let’s call this the AI avatar.
Momentarily, we want to run some Minecraft simulations. So let’s refer to the “ground truth” Minecraft world that my player character is in as Base-Minecraft.
Let’s assume I have read access for the entire current state of Minecraft I am in, and that I have the Minecraft source code. Thankfully because I have a sick ultracomputer gaming rig, I can easily compute a plan (i.e. a sequence of actions) for the AI avatar to perform.
For each finite action sequence, I create a new Minecraft instance identical to Base-Minecraft, let’s assume this includes a perfect simulation of my brain, and simulate the world. In each simulated world, the simulated AI avatar will perform the action sequence corresponding to the simulated world. Once the agent has performed all actions in the sequence we end up in some final world state.
Now we just need to somehow select an action sequence that leads to a good final world state.
My AI helps me to get Diamonds. Right?
I can write a simple program that queries all final world states, and checks if the AI avatar has at least 1 diamond in its inventory. Then it picks a plan of minimal length and lets the AI avatar execute it. Let’s see the AI getting some diamonds for us:
(I recommend you try to predict what will happen before reading on.)
Scene: We take the perspective of JOHANNES’ Minecraft avatar. Johannes is standing in his Minecraft base, looking at the AI AVATAR standing in front of him motionless, looking of to the side. In the background from left to right, there are a few CHESTS, an ENTRANCE TO A MINE, and a 3-block high GLASS WINDOW.
Johannes: Ok, let’s get some diamonds (activates the agent).
AI avatar:INSTANTLY TURNS towards one of the chests and starts SPRINTING towards it.
Johannes: “Ah, nice! It seems we found a plan that has the AI avatar utilize the tools I already have lying around. It will probably want to get some torches and...”
AI avatar:Reaches and opens the chest, instantly equipping a DIMOND SWORD, flicks around, and SPRINTS towards Johannes.
Johannes: “What… How...” (Briefly opens his inventory, which contains some diamonds among other things). (Screaming) “Ahhhhhhhh, I have a dimond!”
SUPER: Pause symbol, and VHS pause effect.
Well, that wasn’t quite what I wanted. Let’s forbid it from using swords. Wait! Then it would axe me to death. It should never attack me! Let’s try with this additional constraint.
(Again try to predict what will happen.)
After the experiment: Well, I guess technically the TNT blew me up. It even says so in the kill message.
Let’s specify that I need to be alive in the target state.
(Again try to predict what will happen.)
Scene: Same scene as the initial scene.
Johannes: (whimpering) Please don’t kill me (activates the AI while flinching.)
AI avatar:INSTANTLY TURNS towards a chest and starts SPRINTING towards it, and once reached instantly equips a maximally enchanted diamond pickaxe, and moves towards the entrance to the mine.
Johannes: Oh, well actually I’d prefer if the agent just uses an iron pickaxe. It took really long to enchant this...
AI avantar:Sprints Past the Mining entrance and starts destroying the glass window to get outside.
Johannes: (screaming) Ahhhhhhhhh, what are you doing?!
Johannes: Ah I remember. There is another storage room 2 floors below. There’s a chest with some diamonds there. And I guess breaking my glass window and digging a hole into the wall to get back in is simply faster than the door.
Johannes: Wait. Why am I such an impossible idiot?! I can just look at the results before running the agent. Well unless I’m in the simulation HAHAHA. But what’s the probability of that? I mean there is one real me and...
Playing Minecraft with a Superintelligence
TLDR: Through concrete scenic descriptions illustrate how I expect naive goal specifications to fail, for getting diamonds in Minecraft.
Not much beyond the concrete examples is original. Also checkout the excellent Specification Gaming video by Rational Animations.
The Setup
I am playing Minecraft. I’d like to have an AI companion that can perform all sorts of tasks, like obtaining diamonds and giving them to me. The AI controls a normal player character with the usual controls. Let’s call this the AI avatar.
Momentarily, we want to run some Minecraft simulations. So let’s refer to the “ground truth” Minecraft world that my player character is in as Base-Minecraft.
Let’s assume I have read access for the entire current state of Minecraft I am in, and that I have the Minecraft source code. Thankfully because I have a sick ultracomputer gaming rig, I can easily compute a plan (i.e. a sequence of actions) for the AI avatar to perform.
For each finite action sequence, I create a new Minecraft instance identical to Base-Minecraft, let’s assume this includes a perfect simulation of my brain, and simulate the world. In each simulated world, the simulated AI avatar will perform the action sequence corresponding to the simulated world. Once the agent has performed all actions in the sequence we end up in some final world state.
Now we just need to somehow select an action sequence that leads to a good final world state.
My AI helps me to get Diamonds. Right?
I can write a simple program that queries all final world states, and checks if the AI avatar has at least 1 diamond in its inventory. Then it picks a plan of minimal length and lets the AI avatar execute it. Let’s see the AI getting some diamonds for us:
(I recommend you try to predict what will happen before reading on.)
Scene: We take the perspective of JOHANNES’ Minecraft avatar. Johannes is standing in his Minecraft base, looking at the AI AVATAR standing in front of him motionless, looking of to the side. In the background from left to right, there are a few CHESTS, an ENTRANCE TO A MINE, and a 3-block high GLASS WINDOW.
Johannes: Ok, let’s get some diamonds (activates the agent).
AI avatar: INSTANTLY TURNS towards one of the chests and starts SPRINTING towards it.
Johannes: “Ah, nice! It seems we found a plan that has the AI avatar utilize the tools I already have lying around. It will probably want to get some torches and...”
AI avatar: Reaches and opens the chest, instantly equipping a DIMOND SWORD, flicks around, and SPRINTS towards Johannes.
Johannes: “What… How...” (Briefly opens his inventory, which contains some diamonds among other things). (Screaming) “Ahhhhhhhh, I have a dimond!”
SUPER: Pause symbol, and VHS pause effect.
Well, that wasn’t quite what I wanted. Let’s forbid it from using swords. Wait! Then it would axe me to death. It should never attack me! Let’s try with this additional constraint.
(Again try to predict what will happen.)
After the experiment: Well, I guess technically the TNT blew me up. It even says so in the kill message.
Let’s specify that I need to be alive in the target state.
(Again try to predict what will happen.)
Scene: Same scene as the initial scene.
Johannes: (whimpering) Please don’t kill me (activates the AI while flinching.)
AI avatar: INSTANTLY TURNS towards a chest and starts SPRINTING towards it, and once reached instantly equips a maximally enchanted diamond pickaxe, and moves towards the entrance to the mine.
Johannes: Oh, well actually I’d prefer if the agent just uses an iron pickaxe. It took really long to enchant this...
AI avantar: Sprints Past the Mining entrance and starts destroying the glass window to get outside.
Johannes: (screaming) Ahhhhhhhhh, what are you doing?!
Johannes: Ah I remember. There is another storage room 2 floors below. There’s a chest with some diamonds there. And I guess breaking my glass window and digging a hole into the wall to get back in is simply faster than the door.
Johannes: Wait. Why am I such an impossible idiot?! I can just look at the results before running the agent. Well unless I’m in the simulation HAHAHA. But what’s the probability of that? I mean there is one real me and...
CUT TO BLACK