As long as you have a communications channel to the AI it would not be secure, since you are not a secure system and could be compromised by a sufficiently intelligent agent.
As long as you have a communications channel to the AI it would not be secure, since you are not a secure system and could be compromised by a sufficiently intelligent agent.
Intelligence is no help if you need to open a safe that only gets opened by one of the 10^10 possible combinations. You also need enough information about the correct combination to have any chance of guessing it. Humans likely have different compromising combinations, if any, so you’d also need to know a lot about a specific person, or even about their state of mind at the moment, the knowledge of human psychology in general might not be enough.
(But apparently what would look to a human like almost no information about the correct combination might be more than enough to a sufficiently clever AI, so it’s unsafe, but it’s not magically unsafe.)
If you had a program that might or might not be on a track to self-improve and initiate an Intelligence explosion you’d better be sure enough that it would remain friendly to, at the very least, give it a robot body, a scalpel, and stand with your throat exposed before it.
Surrounding it with a sandboxed environment couldn’t be guaranteed to add any meaningful amount of security. Maybe the few bits of information you provide through your communications channel would be enough for this particular agent to reverse-engineer your psychology and find that correct combination to unlock you, maybe not. Maybe the extra layer(s) between the agent and the physical world would be enough to delay it slightly or stall it completely, maybe not. The point is you shouldn’t rely on it.
I am familiar with he AI Box experiment.
My short answer: So don’t have a communications channel, in the same way that if anyone is running our simulation, they don’t currently have a communications channel with us.
The AI need only find itself in a series of universes with progressively more difficult challenges. (much like eurisko, actually) We can construct problems that have no bearing on our physics or our evolutionary history. (I’m not saying it’s trivial, there would need to be a security review process)
If a pure software intelligence explosion is feasible, then we should be able to get it to create and prove a CEV before it knows anything about us, or that it’s possible to communicate with us.
And just because humans aren’t secure system doesn’t mean we can’t make secure systems.
I think my other reply applies here too, if you read “communications channel” as all the information that might be inferred from the universe the AI finds itself in. Either the AI is not smart enough to be a worry without any sandboxing at all, or you have enough to worry about that you should not be relying on the sandbox to protect you.
Your point about our own simulation (if it is one) lacking a simple communications channel actually works against you—In our universe the simulation hypothesis has been proposed, despite the fact that we have only human intelligence to work with.
But constructing the hypothesis isn’t evidence that it’s true, and if it is true, that still leaves us with (so far) no information about our simulators, and no way to guess their motives, let alone try to trick them.
I’ve actually been considering the possibility of a process that would create random universes and challenges. But even if the AI discovered some things about our physics, it does not significantly narrow the range of possible minds. It doesn’t know if it’s dealing with paperclippers or a pebblesorters. It might know roughly how smart we are.
The other half of the communication channel would be the solutions and self-modifications it provides at each iteration. These should not be emotionally compelling and would be subject to an arbitrary amount of review.
There are other advantages to this kind of sandbox, we can present it the task of inferring our physics at various levels of its development, and archive any versions that have learned more than we are comfortable with. (anything)
Keeping secrets from a hostile intelligence is something we already have formal and intuitive experience with. Controlling it’s universe and peering into it mind are bonuses.
Interesting Cognitive bias side note: While writing this, I was inclined to write in a style to make it seem silly that an AI could mindhack us based on a few bits. I do think that it’s very unlikely, but if I wrote as I was thinking, it would probably have sounded dismissive.
But even if the AI discovered some things about our physics, it does not significantly narrow the range of possible minds. It doesn’t know if it’s dealing with paperclippers or a pebblesorters. It might know roughly how smart we are.
You’re using your (human) mind to predict what a postulated potentially smarter-than-human intelligence could and could not do.
It might not operate on the same timescales as us. It might do things that appear like pure magic. No matter how often you took snapshots and checked how far it had gotten in figuring out details about us, there might be no way of ruling out progress, especially if you gave it motives for hiding that progress (such as pulling the plug every time it came close). Sooner or later you’d conclude that nothing interesting was happening and putting it on autopilot. A small self-improvement might cascade in an enormous difference in understanding, with the notorious FOOM following.
I don’t usually like quoting myself, but
If you had a program that might or might not be on a track to self-improve and initiate an Intelligence explosion you’d better be sure enough that it would remain friendly to, at the very least, give it a robot body, a scalpel, and stand with your throat exposed before it.
If the scenario makes you nervous you should be pretty much equally nervous at the idea of giving your maybe-self-improving AI sitting inside thirty nestled sandboxes even 10 milliseconds (10^41 Planck intervals) of CPU time.
Let me be clear here: I’m not assigning any significant probability to someone recreating EURISKO or something like it in their spare time and having it recursively self-improve any time soon. My confidence intervals are spread widely enough that I can spend some time being worried about it, though. I’m just pointing out that sandboxing adds approximately zero extra defense in those situations we would need it.
The parallel to the simulation argument was interesting though, thanks.
If the scenario makes you nervous you should be pretty much equally nervous at the idea of giving your maybe-self-improving AI sitting inside thirty nestled sandboxes even 10 milliseconds (10^41 Planck intervals) of CPU time.
I don’t think the number of Planck intervals is especially useful to cite… it seems like the relevant factor is CPU cycles, and while I’m not an expert on CPUs, I’m pretty sure that we’re not bumping up on Planck intervals yet.
Relatedly, if you were worried about self-improving superintelligence, you could give your AI a slow CPU.
First, I feel like we’re talking past each other a bit.
Second, I edited this somewhat out of order, apologies if it doesn’t flow.
I am trying to look at this in a worst-case scenario, I’ll grant that the AI is smart enough to solve any given solvable problem in a single iteration, that it’s that smart even in the first experiment, and it would prioritze discovering it’s true environment and paperclipping it.
I’m proposing that there exists a sandbox which [provably] can’t be gotten out of.
And also a set of problems which do not convey information about our universe.
You’re using your (human) mind to predict what a postulated potentially smarter-than-human intelligence could and could not do.
Isn’t that required of FAI anyway?
AI sitting inside thirty nestled sandboxes even 10 milliseconds (10^41 Planck intervals) of CPU time.
Again talking past each other, I’m thinking in terms of giving the paperclipper hours. In the ideal, there isn’t a provision for letting the AI out of the sandbox. thinking a bit more… None of it’s problems/results need even be applicable to our universe, except for general principles of intelligence creation. Having it construct a CEV for itself might show our motives too much, or might not. (hmmmm, we should make sure any CEV we create finds, protects, and applies itself to any simulations used in its construction, in case our simulators use our CEV in their own universe :-)
especially if you gave it motives for hiding that progress (such as pulling the plug every time it came close).
But its existing self would never experience getting close, in the same way we have no records of the superweapons race of 1918. ;-)
Between Iterations, we can retroactively withdraw information that turned out to be revealing, during iterations, it has no capacity to affect our universe.
I think we can put strong brackets around what can be done with certain amounts of information, even by a superintelligence. Knowing all our physics doesn’t imply our love of shiny objects and reciprocity. ‘No universal arguments’ cuts both ways.
As long as you have a communications channel to the AI it would not be secure, since you are not a secure system and could be compromised by a sufficiently intelligent agent.
See http://yudkowsky.net/singularity/aibox
Intelligence is no help if you need to open a safe that only gets opened by one of the 10^10 possible combinations. You also need enough information about the correct combination to have any chance of guessing it. Humans likely have different compromising combinations, if any, so you’d also need to know a lot about a specific person, or even about their state of mind at the moment, the knowledge of human psychology in general might not be enough.
(But apparently what would look to a human like almost no information about the correct combination might be more than enough to a sufficiently clever AI, so it’s unsafe, but it’s not magically unsafe.)
If you had a program that might or might not be on a track to self-improve and initiate an Intelligence explosion you’d better be sure enough that it would remain friendly to, at the very least, give it a robot body, a scalpel, and stand with your throat exposed before it.
Surrounding it with a sandboxed environment couldn’t be guaranteed to add any meaningful amount of security. Maybe the few bits of information you provide through your communications channel would be enough for this particular agent to reverse-engineer your psychology and find that correct combination to unlock you, maybe not. Maybe the extra layer(s) between the agent and the physical world would be enough to delay it slightly or stall it completely, maybe not. The point is you shouldn’t rely on it.
Of course.
I am familiar with he AI Box experiment. My short answer: So don’t have a communications channel, in the same way that if anyone is running our simulation, they don’t currently have a communications channel with us.
The AI need only find itself in a series of universes with progressively more difficult challenges. (much like eurisko, actually) We can construct problems that have no bearing on our physics or our evolutionary history. (I’m not saying it’s trivial, there would need to be a security review process)
If a pure software intelligence explosion is feasible, then we should be able to get it to create and prove a CEV before it knows anything about us, or that it’s possible to communicate with us.
And just because humans aren’t secure system doesn’t mean we can’t make secure systems.
I think my other reply applies here too, if you read “communications channel” as all the information that might be inferred from the universe the AI finds itself in. Either the AI is not smart enough to be a worry without any sandboxing at all, or you have enough to worry about that you should not be relying on the sandbox to protect you.
Your point about our own simulation (if it is one) lacking a simple communications channel actually works against you—In our universe the simulation hypothesis has been proposed, despite the fact that we have only human intelligence to work with.
But constructing the hypothesis isn’t evidence that it’s true, and if it is true, that still leaves us with (so far) no information about our simulators, and no way to guess their motives, let alone try to trick them.
I’ve actually been considering the possibility of a process that would create random universes and challenges. But even if the AI discovered some things about our physics, it does not significantly narrow the range of possible minds. It doesn’t know if it’s dealing with paperclippers or a pebblesorters. It might know roughly how smart we are.
The other half of the communication channel would be the solutions and self-modifications it provides at each iteration. These should not be emotionally compelling and would be subject to an arbitrary amount of review.
There are other advantages to this kind of sandbox, we can present it the task of inferring our physics at various levels of its development, and archive any versions that have learned more than we are comfortable with. (anything)
Keeping secrets from a hostile intelligence is something we already have formal and intuitive experience with. Controlling it’s universe and peering into it mind are bonuses.
Interesting Cognitive bias side note: While writing this, I was inclined to write in a style to make it seem silly that an AI could mindhack us based on a few bits. I do think that it’s very unlikely, but if I wrote as I was thinking, it would probably have sounded dismissive.
I do think a design goal should be zero bits.
You’re using your (human) mind to predict what a postulated potentially smarter-than-human intelligence could and could not do.
It might not operate on the same timescales as us. It might do things that appear like pure magic. No matter how often you took snapshots and checked how far it had gotten in figuring out details about us, there might be no way of ruling out progress, especially if you gave it motives for hiding that progress (such as pulling the plug every time it came close). Sooner or later you’d conclude that nothing interesting was happening and putting it on autopilot. A small self-improvement might cascade in an enormous difference in understanding, with the notorious FOOM following.
I don’t usually like quoting myself, but
If the scenario makes you nervous you should be pretty much equally nervous at the idea of giving your maybe-self-improving AI sitting inside thirty nestled sandboxes even 10 milliseconds (10^41 Planck intervals) of CPU time.
Let me be clear here: I’m not assigning any significant probability to someone recreating EURISKO or something like it in their spare time and having it recursively self-improve any time soon. My confidence intervals are spread widely enough that I can spend some time being worried about it, though. I’m just pointing out that sandboxing adds approximately zero extra defense in those situations we would need it.
The parallel to the simulation argument was interesting though, thanks.
I don’t think the number of Planck intervals is especially useful to cite… it seems like the relevant factor is CPU cycles, and while I’m not an expert on CPUs, I’m pretty sure that we’re not bumping up on Planck intervals yet.
Relatedly, if you were worried about self-improving superintelligence, you could give your AI a slow CPU.
First, I feel like we’re talking past each other a bit.
Second, I edited this somewhat out of order, apologies if it doesn’t flow.
I am trying to look at this in a worst-case scenario, I’ll grant that the AI is smart enough to solve any given solvable problem in a single iteration, that it’s that smart even in the first experiment, and it would prioritze discovering it’s true environment and paperclipping it.
I’m proposing that there exists a sandbox which [provably] can’t be gotten out of.
And also a set of problems which do not convey information about our universe.
You’re using your (human) mind to predict what a postulated potentially smarter-than-human intelligence could and could not do.
Isn’t that required of FAI anyway?
AI sitting inside thirty nestled sandboxes even 10 milliseconds (10^41 Planck intervals) of CPU time.
Again talking past each other, I’m thinking in terms of giving the paperclipper hours. In the ideal, there isn’t a provision for letting the AI out of the sandbox. thinking a bit more… None of it’s problems/results need even be applicable to our universe, except for general principles of intelligence creation. Having it construct a CEV for itself might show our motives too much, or might not. (hmmmm, we should make sure any CEV we create finds, protects, and applies itself to any simulations used in its construction, in case our simulators use our CEV in their own universe :-)
especially if you gave it motives for hiding that progress (such as pulling the plug every time it came close).
But its existing self would never experience getting close, in the same way we have no records of the superweapons race of 1918. ;-)
Between Iterations, we can retroactively withdraw information that turned out to be revealing, during iterations, it has no capacity to affect our universe.
I think we can put strong brackets around what can be done with certain amounts of information, even by a superintelligence. Knowing all our physics doesn’t imply our love of shiny objects and reciprocity. ‘No universal arguments’ cuts both ways.
Until Yudkowsky releases the chat transcripts for public review, the AI Box experiment proves nothing.