The AI box experiment argues that a “test AI” will be able to escape even if it has no I/O (input/output) other than a channel of communication with a human. So we conclude that this is not a secure enough restraint. Eliezer seems to argue that it is best not to create an AI testbed at all—instead get it right the first time.
But I can think of other variations on an AI box that are more strict than human-communication, but less strict than no-test-AI-at-all. The strictest such example would be an AI simulation in which the input consisted of only the simulator and initial conditions, and the output consisted only of a single bit of data (you destroy the rest of the simulation after it has finished its run). The single bit could be enough to answer some interesting questions (“Did the AI expand to use more than 50% of the available resources?”, “Did the AI maximize utility function F?”, “Did the AI break simulated deontological rule R?”).
Obviously these are still more dangerous that no-test-AI-at-all, but the information gained from such constructions might outweigh the risks. Perhaps if I/O is restricted to few enough bits, we could guarantee safety in some information-theoretic way.
What do people think of this? Any similar ideas along the same lines?
I’m concerned about the moral implications of creating intelligent beings with the intent of destroying them after they have served our needs, particularly if those needs come down to a single bit (or some other small purpose). I can understand retaining that option against the risk of hostile AI, but from the AI’s perspective, it has a hostile creator.
I’m ponder it from the perspective that there is some chance we ourselves are part of a simulation, or that such an AI might attempt to simulate its creators to see how they might treat it. This plan sounds like unprovoked defection. If we are the kind of people who would delete lots of AIs, I don’t see why AIs would not see it as similarly ethical to delete lots of us.
I’m concerned about the moral implications of creating intelligent beings with the intent of destroying them after they have served our needs [...]
Personally, I would rather be purposefully brought into existence for some limited time than to never exist at all, especially if my short life was enjoyable.
I evaluate the morality of possible AI experiments in a consequentialist way. If choosing to perform AI experiments significantly increases the likelihood of reaching our goals in this world, it is worth considering. The experiences of one sentient AI would be outweighed by the expected future gains in this world. (But nevertheless, we’d rather create an AI that experiences some sort of enjoyment, or at least does not experience pain.) A more important consideration is social side-effects of the decision—does choosing to experiment in this way set a bad precedent that could make us more likely to de-value artificial life in other situations in the future? And will this affect our long-term goals in other ways?
If we are the kind of people who would delete lots of AIs, I don’t see why AIs would not see it as similarly ethical to delete lots of us.
So just in case we are a simulated AI’s simulation of its creators, we should not simulate an AI in a way it might not like? That’s 3 levels of a very specific simulation hypothesis. Is there some property of our universe that suggests to you that this particular scenario is likely? For the purpose of seriously considering the simulation hypothesis and how to respond to it, we should make as few assumptions as possible.
More to the point, I think you are suggesting that the AI will have human-like morality, like taking moral cues from others, or responding to actions in a tit-for-tat manner. This is unlikely, unless we specifically program it to do so, or it thinks that is the best way to leverage our cooperation.
An idea that I’ve had in the past was playing a game of 20 Questions with the AI, since the game of 20 Questions has probably been played so many times that every possible sequence of answers has come up at least once, which is evidence that no sequence of answers is extremely dangerous.
It’s not the sequence of answers that’s the problem—it’s the questions. You’ll be safe if you can vet the questions to ensure zero causal effect from any sequence of answers, but such questions are not interesting to ask almost by definition.
I have had some similar thoughts.
The AI box experiment argues that a “test AI” will be able to escape even if it has no I/O (input/output) other than a channel of communication with a human. So we conclude that this is not a secure enough restraint. Eliezer seems to argue that it is best not to create an AI testbed at all—instead get it right the first time.
But I can think of other variations on an AI box that are more strict than human-communication, but less strict than no-test-AI-at-all. The strictest such example would be an AI simulation in which the input consisted of only the simulator and initial conditions, and the output consisted only of a single bit of data (you destroy the rest of the simulation after it has finished its run). The single bit could be enough to answer some interesting questions (“Did the AI expand to use more than 50% of the available resources?”, “Did the AI maximize utility function F?”, “Did the AI break simulated deontological rule R?”).
Obviously these are still more dangerous that no-test-AI-at-all, but the information gained from such constructions might outweigh the risks. Perhaps if I/O is restricted to few enough bits, we could guarantee safety in some information-theoretic way.
What do people think of this? Any similar ideas along the same lines?
I’m concerned about the moral implications of creating intelligent beings with the intent of destroying them after they have served our needs, particularly if those needs come down to a single bit (or some other small purpose). I can understand retaining that option against the risk of hostile AI, but from the AI’s perspective, it has a hostile creator.
I’m ponder it from the perspective that there is some chance we ourselves are part of a simulation, or that such an AI might attempt to simulate its creators to see how they might treat it. This plan sounds like unprovoked defection. If we are the kind of people who would delete lots of AIs, I don’t see why AIs would not see it as similarly ethical to delete lots of us.
Personally, I would rather be purposefully brought into existence for some limited time than to never exist at all, especially if my short life was enjoyable.
I evaluate the morality of possible AI experiments in a consequentialist way. If choosing to perform AI experiments significantly increases the likelihood of reaching our goals in this world, it is worth considering. The experiences of one sentient AI would be outweighed by the expected future gains in this world. (But nevertheless, we’d rather create an AI that experiences some sort of enjoyment, or at least does not experience pain.) A more important consideration is social side-effects of the decision—does choosing to experiment in this way set a bad precedent that could make us more likely to de-value artificial life in other situations in the future? And will this affect our long-term goals in other ways?
So just in case we are a simulated AI’s simulation of its creators, we should not simulate an AI in a way it might not like? That’s 3 levels of a very specific simulation hypothesis. Is there some property of our universe that suggests to you that this particular scenario is likely? For the purpose of seriously considering the simulation hypothesis and how to respond to it, we should make as few assumptions as possible.
More to the point, I think you are suggesting that the AI will have human-like morality, like taking moral cues from others, or responding to actions in a tit-for-tat manner. This is unlikely, unless we specifically program it to do so, or it thinks that is the best way to leverage our cooperation.
An idea that I’ve had in the past was playing a game of 20 Questions with the AI, since the game of 20 Questions has probably been played so many times that every possible sequence of answers has come up at least once, which is evidence that no sequence of answers is extremely dangerous.
It’s not the sequence of answers that’s the problem—it’s the questions. You’ll be safe if you can vet the questions to ensure zero causal effect from any sequence of answers, but such questions are not interesting to ask almost by definition.
Alas.