The AI figures out that the only way to truly cure its subjects’ cancers is to take over that meddlesome “real world” to stop people from giving its virtual subjects cancers.
That is, the fact that the real world is tampering with the virtual world gives the real world a relevant connection to that virtual world, and thus a relevant connection to an AI whose motivations are based solely on the virtual world.
As long as reality has a causal influence on the model, however, the AI’s motivation includes reality.
Imagine your AI is playing a FPS, and its goal is to win the game. We have a motivation which exists entirely in terms of a virtual model. Yet if it realizes it can, its motivation includes getting the players’ IP addresses, researching their real-world identities, entering theoretically secure systems as necessary, and using psychological warfare over the in-game chat system “Hey Samuel Smith, did you know your aunt Mildred Smith has pictures of you sitting naked in the bowl of a toilet as a child on her hard drive?” to gain a major advantage. Maybe it goes so far as to SWAT opposing enemy teams.
This kind of box only requires the AI to, well, think outside the box, for its solutions.
This depends entirely how the AI is set up. The AI is defined as only caring about the abstract environment. Not just that it doesn’t care about the real world, but that it literally doesn’t believe in the real world. Not like some video game we set up, and plug the AI into. The abstract room exists entirely in the AIs head.
Normally we give AI real data, and it tries to infer a world model. Then it tries to take actions that maximize it’s chance of achieving it’s goal, according to it’s world model.
In this case we are giving the AI it’s world model. The model is set in stone. The AI acts as if the model is true, and doesn’t care about anything that is not contained in the model.
The AI is purely a planning agent. One which is given a well defined model and tries to find a series of actions that achieve some goal. You could just as easily give such an AI tasks like finding the factors of a large semiprime, or finding the shortest path through a maze, or proving a theorem, etc. The simulated room is just a really really big formal math problem.
Now whether you can actually make an AI which works like this, and doesn’t try to infer it’s own world model at all, is another problem. But there is quite a large body of work with planning algorithms, which do this exactly. Godel machines also behave this way. You give them a formal specification of their “world”, and they take actions that are provably optimal according the world model.
An AI that can’t update an inaccurate world model isn’t a very good AI.
I could write code to simulate the bloodflow and whatnot of the body, with a cancerous growth somewhere. I could write code to simulate a scalpel slicing something open, and the first simulation would react accordingly. And if I plugged that into a Godel machine, with the first being a model, and the second being a mathematical operation, I’d get an amazingly good surgical machine.
But that’s not what I was looking for. I was looking for something innovative. I was looking for the AI to surprise me.
And you have an amazingly good tool-user, that still doesn’t innovate or surprise you.
That’s not entirely true. It might surprise us by, say, showing us the precise way to use an endoscopic cauterizer to cut off blood flow to a tumor without any collateral damage. But it can’t, by definition, invent a new tool entirely.
I’m not sure the solution to the AI friendliness problem is “Creating AI that is too narrow-minded to be dangerous”. You throw out most of what is intended to be achieved by AI in the first place, and achieve little more than evolutionary algorithms are already capable of. (If you’re capable of modeling the problem to that extent, you can just toss it, along with the toolset, into an evolutionary algorithm and get something pretty close to just as good.)
But it can’t, by definition, invent a new tool entirely.
Why not? The AI can do anything that is allowed by the laws of physics, and maybe a bit more if we let it. It could invent a molecule which acts as a drug which kills the cancer. It could use the tools in the room to build different tools. It could give us plans for tiny nanobots which enter the bloodstream and target cancer cells. Etc.
Just because an environment is well defined, does not mean you can’t invent anything new.
The AI can do anything that is allowed by the laws of physics
No, it can do anything that its worldview includes, and any operations defined internally. You’re no longer talking about a Godel Machine, and you’ve lost all your safety constraints.
You can give, in theory of course, a formal description of the laws of physics. Then you can ask it to produce a plan or machine which fulfills any constraints you ask. You don’t need to worry about it escaping from the box. Now it’s solution might be terrible without tons of constraints, but it’s at least not optimized to escape from the box or to trick you.
it can’t, by definition, invent a new tool entirely.
Can humans “invent a new tool entirely”, when all we have to work with are a handful of pre-defined quarks, leptons and bosons? AIXI is hard-coded to just use one tool, a Turing Machine; yet the open-endedness of that tool make it infinitely inventive.
We can easily put a machine shop, or any other manufacturing capabilities, into the abstract room. We could ignore the tedious business of manufacturing and just include a Star-Trek-style replicator, which allows the AI to use anything for which is can provide blueprints.
Also, we can easily be surprised by actions taken in the room. For example, we might simulate the room according to known scientific laws, and have it automatically suspend if anything strays too far into uncertain territory. We can then either abort the simulation, if something dangerous or undesirable is happening within, or else perform an experiment to see what would happen in that situation, then feed the result back in and resume. That would be a good way to implement an artificial scientist. Similar ideas are explored in http://lambda-the-ultimate.org/node/4392
Your response ignores the constraints this line of conversation has already engendered. I’m happy to reply to you, but your response doesn’t have anything to do with the conversation that has already taken place.
Let’s suppose the constraint on the AI being unable to update its world model applies. How can it use a tool it has just invented? It can’t update its world model to include that tool.
Supposing it -can- update its world model, but only in reference to new tools it has developed: How do you prevent it from inventing a tool like the psychological manipulation of the experimenters running the simulation?
The AI figures out that the only way to truly cure its subjects’ cancers is to take over that meddlesome “real world” to stop people from giving its virtual subjects cancers.
That is, the fact that the real world is tampering with the virtual world gives the real world a relevant connection to that virtual world, and thus a relevant connection to an AI whose motivations are based solely on the virtual world.
That’s why you want to specify the AIs motivation entirely in terms of the model, and reset it from one episode to the next.
As long as reality has a causal influence on the model, however, the AI’s motivation includes reality.
Imagine your AI is playing a FPS, and its goal is to win the game. We have a motivation which exists entirely in terms of a virtual model. Yet if it realizes it can, its motivation includes getting the players’ IP addresses, researching their real-world identities, entering theoretically secure systems as necessary, and using psychological warfare over the in-game chat system “Hey Samuel Smith, did you know your aunt Mildred Smith has pictures of you sitting naked in the bowl of a toilet as a child on her hard drive?” to gain a major advantage. Maybe it goes so far as to SWAT opposing enemy teams.
This kind of box only requires the AI to, well, think outside the box, for its solutions.
This depends entirely how the AI is set up. The AI is defined as only caring about the abstract environment. Not just that it doesn’t care about the real world, but that it literally doesn’t believe in the real world. Not like some video game we set up, and plug the AI into. The abstract room exists entirely in the AIs head.
Normally we give AI real data, and it tries to infer a world model. Then it tries to take actions that maximize it’s chance of achieving it’s goal, according to it’s world model.
In this case we are giving the AI it’s world model. The model is set in stone. The AI acts as if the model is true, and doesn’t care about anything that is not contained in the model.
The AI is purely a planning agent. One which is given a well defined model and tries to find a series of actions that achieve some goal. You could just as easily give such an AI tasks like finding the factors of a large semiprime, or finding the shortest path through a maze, or proving a theorem, etc. The simulated room is just a really really big formal math problem.
Now whether you can actually make an AI which works like this, and doesn’t try to infer it’s own world model at all, is another problem. But there is quite a large body of work with planning algorithms, which do this exactly. Godel machines also behave this way. You give them a formal specification of their “world”, and they take actions that are provably optimal according the world model.
An AI that can’t update an inaccurate world model isn’t a very good AI.
I could write code to simulate the bloodflow and whatnot of the body, with a cancerous growth somewhere. I could write code to simulate a scalpel slicing something open, and the first simulation would react accordingly. And if I plugged that into a Godel machine, with the first being a model, and the second being a mathematical operation, I’d get an amazingly good surgical machine.
But that’s not what I was looking for. I was looking for something innovative. I was looking for the AI to surprise me.
Then you give the AI more options than just surgery. It has an entire simulated room of tools to work with.
And you have an amazingly good tool-user, that still doesn’t innovate or surprise you.
That’s not entirely true. It might surprise us by, say, showing us the precise way to use an endoscopic cauterizer to cut off blood flow to a tumor without any collateral damage. But it can’t, by definition, invent a new tool entirely.
I’m not sure the solution to the AI friendliness problem is “Creating AI that is too narrow-minded to be dangerous”. You throw out most of what is intended to be achieved by AI in the first place, and achieve little more than evolutionary algorithms are already capable of. (If you’re capable of modeling the problem to that extent, you can just toss it, along with the toolset, into an evolutionary algorithm and get something pretty close to just as good.)
Why not? The AI can do anything that is allowed by the laws of physics, and maybe a bit more if we let it. It could invent a molecule which acts as a drug which kills the cancer. It could use the tools in the room to build different tools. It could give us plans for tiny nanobots which enter the bloodstream and target cancer cells. Etc.
Just because an environment is well defined, does not mean you can’t invent anything new.
No, it can do anything that its worldview includes, and any operations defined internally. You’re no longer talking about a Godel Machine, and you’ve lost all your safety constraints.
You can give, in theory of course, a formal description of the laws of physics. Then you can ask it to produce a plan or machine which fulfills any constraints you ask. You don’t need to worry about it escaping from the box. Now it’s solution might be terrible without tons of constraints, but it’s at least not optimized to escape from the box or to trick you.
Can humans “invent a new tool entirely”, when all we have to work with are a handful of pre-defined quarks, leptons and bosons? AIXI is hard-coded to just use one tool, a Turing Machine; yet the open-endedness of that tool make it infinitely inventive.
We can easily put a machine shop, or any other manufacturing capabilities, into the abstract room. We could ignore the tedious business of manufacturing and just include a Star-Trek-style replicator, which allows the AI to use anything for which is can provide blueprints.
Also, we can easily be surprised by actions taken in the room. For example, we might simulate the room according to known scientific laws, and have it automatically suspend if anything strays too far into uncertain territory. We can then either abort the simulation, if something dangerous or undesirable is happening within, or else perform an experiment to see what would happen in that situation, then feed the result back in and resume. That would be a good way to implement an artificial scientist. Similar ideas are explored in http://lambda-the-ultimate.org/node/4392
Your response ignores the constraints this line of conversation has already engendered. I’m happy to reply to you, but your response doesn’t have anything to do with the conversation that has already taken place.
Let’s suppose the constraint on the AI being unable to update its world model applies. How can it use a tool it has just invented? It can’t update its world model to include that tool.
Supposing it -can- update its world model, but only in reference to new tools it has developed: How do you prevent it from inventing a tool like the psychological manipulation of the experimenters running the simulation?