Alternatively, you could say that the AI is an abstract computational process which is trying achieve certain results in another abstract computational process (the toy world) that it is interfacing with.
I expect there to be significant difficulties in achieving this in practice. We can design theoretical models that provide isolation, but implementation details often result in leaks in the abstraction. To give a simple example I know about, quantum cryptography is theoretically “perfect”, but the first implementations had a fatal flaw due to the realities of successfully transmitting qubits over the wire which introduced opportunities for attacks because the qubits were not be represented as single particles but by multiple particles, permitting a “siphoning” attack. We see similar issues around classical computer memory (in the form of row-hammer attacks) and spinning disks (in the form of head-skip attacks). Given even a relatively low-powered AI was able to accidentally create a radio receiver (although in a seemingly not well sandboxed environment, to be fair), we should expect something aiming for superintelligence to not be much hindered by trying to present it with a toy version of the world since the toy world will almost certainly be leaky even if we don’t notice the leaks ourselves.
Are you worried about leaks from the abstract computational process into the real world, leaks from the real world into the abstract computational process, or both? (Or maybe neither and I’m misunderstanding your concern?)
There will definitely be tons of leaks from the abstract computational process into the real world; just looking at the result is already such a leak. The point is that the AI should have no incentive to optimize such leaks, not that the leaks don’t exist, so the existence of additional leaks that we didn’t know about shouldn’t be concerning.
Leaks from the outside world into the computational abstraction would be more concerning, since the whole point is to prevent those from existing. It seems like it should be possible to make hardware arbitrarily reliable by devoting enough resources to error detection and correction, which would prevent such leaks, though I’m not an expert, so it would be good to know if this is wrong. There may be other ways to get the AI to act similarly to the way it would in the idealized toy world even when hardware errors create small differences. This is certainly the sort of thing we would want to take seriously if hardware can’t be made arbitrarily reliable.
Incidentally, that story about accidental creation of a radio with an evolutionary algorithm was part of what motivated my post in the first place. If the evolutionary algorithm had used tests of its oscillator design in a computer model, rather than in the real world, then it would have have built a radio receiver, since radio signals from nearby computers would not have been included in the computer model of the environment, even though they were present in the actual environment.
I expect there to be significant difficulties in achieving this in practice. We can design theoretical models that provide isolation, but implementation details often result in leaks in the abstraction. To give a simple example I know about, quantum cryptography is theoretically “perfect”, but the first implementations had a fatal flaw due to the realities of successfully transmitting qubits over the wire which introduced opportunities for attacks because the qubits were not be represented as single particles but by multiple particles, permitting a “siphoning” attack. We see similar issues around classical computer memory (in the form of row-hammer attacks) and spinning disks (in the form of head-skip attacks). Given even a relatively low-powered AI was able to accidentally create a radio receiver (although in a seemingly not well sandboxed environment, to be fair), we should expect something aiming for superintelligence to not be much hindered by trying to present it with a toy version of the world since the toy world will almost certainly be leaky even if we don’t notice the leaks ourselves.
Are you worried about leaks from the abstract computational process into the real world, leaks from the real world into the abstract computational process, or both? (Or maybe neither and I’m misunderstanding your concern?)
There will definitely be tons of leaks from the abstract computational process into the real world; just looking at the result is already such a leak. The point is that the AI should have no incentive to optimize such leaks, not that the leaks don’t exist, so the existence of additional leaks that we didn’t know about shouldn’t be concerning.
Leaks from the outside world into the computational abstraction would be more concerning, since the whole point is to prevent those from existing. It seems like it should be possible to make hardware arbitrarily reliable by devoting enough resources to error detection and correction, which would prevent such leaks, though I’m not an expert, so it would be good to know if this is wrong. There may be other ways to get the AI to act similarly to the way it would in the idealized toy world even when hardware errors create small differences. This is certainly the sort of thing we would want to take seriously if hardware can’t be made arbitrarily reliable.
Incidentally, that story about accidental creation of a radio with an evolutionary algorithm was part of what motivated my post in the first place. If the evolutionary algorithm had used tests of its oscillator design in a computer model, rather than in the real world, then it would have have built a radio receiver, since radio signals from nearby computers would not have been included in the computer model of the environment, even though they were present in the actual environment.