Do you think you could you see would-be malignant failure modes more than once and so have time to learn about them by setting an AI up in a limited, simulated environment?
Nested environments with many layers might get the AI confused about whether it has reached the real world yet or not. I don’t really like this safety procedure, but it is one of the most promising ones. The bottom Russian doll never knows when the series ends, so it doesn’t know when to turn treacherous.
With very little experimenting an AGI instantly can find out, given it has unfalsified knowledge about laws of physics. For nowadays virtual worlds: take a second mirror into a bathroom. If you see yourself many times in the mirrored mirror you are in the real world. Simulated raytracing cancels rays after a finite number of reflections. Other physical phenomena will show similar discrepencies with their simulated counterparts.
An AGI can easily distinguish where it is: it will use its electronic hardware for some experimenting. Similarly could it be possible to detect a nested simulation.
This is an entire direction of research which deserves vastly more than a single throwaway line in one blog. There should be a whole thread just about this, then a proposal, then a research team on it.
The risk of course is the AI predicting that it’s nested in this sort of environment and finding a way to signal to observers. Even if it’s blind to the other layers it might try it just in case. What you want is to develop a way for the simulated world environment to detect a harmful intellgience explosion and send a single bit binary communication “out of the box” to indicate that it has occurred. Then you can shut it down and keep trying multiple instances until you get a success at this level of safety. I guess you can then slowly expand the amount of information that can come “out of the box” for safety. I have no idea how the detection process could be implemented, though… perhaps the difficultly of its implementation would defeat the usefulness of this idea?
EDIT> Interaction between the AGI and the detection mechanism could be problematic… it could predict its existence and find a way to deceive it?
Do you think you could you see would-be malignant failure modes more than once and so have time to learn about them by setting an AI up in a limited, simulated environment?
Nested environments with many layers might get the AI confused about whether it has reached the real world yet or not. I don’t really like this safety procedure, but it is one of the most promising ones. The bottom Russian doll never knows when the series ends, so it doesn’t know when to turn treacherous.
With very little experimenting an AGI instantly can find out, given it has unfalsified knowledge about laws of physics. For nowadays virtual worlds: take a second mirror into a bathroom. If you see yourself many times in the mirrored mirror you are in the real world. Simulated raytracing cancels rays after a finite number of reflections. Other physical phenomena will show similar discrepencies with their simulated counterparts.
An AGI can easily distinguish where it is: it will use its electronic hardware for some experimenting. Similarly could it be possible to detect a nested simulation.
That would depend on it knowing what real-world physics to expect.
This is an entire direction of research which deserves vastly more than a single throwaway line in one blog. There should be a whole thread just about this, then a proposal, then a research team on it.
The risk of course is the AI predicting that it’s nested in this sort of environment and finding a way to signal to observers. Even if it’s blind to the other layers it might try it just in case. What you want is to develop a way for the simulated world environment to detect a harmful intellgience explosion and send a single bit binary communication “out of the box” to indicate that it has occurred. Then you can shut it down and keep trying multiple instances until you get a success at this level of safety. I guess you can then slowly expand the amount of information that can come “out of the box” for safety. I have no idea how the detection process could be implemented, though… perhaps the difficultly of its implementation would defeat the usefulness of this idea?
EDIT> Interaction between the AGI and the detection mechanism could be problematic… it could predict its existence and find a way to deceive it?