“Here are two worlds compatible with your observations. In one, you are part of a whole galaxy of stars stretching out for billions of lightyears, with all these particles moving according to the laws of physics. In the other, you are simulated within a another AI that produces the same observations. I want you to put a prior of zero on the second types of worlds.”
This won’t solve the issue if the superintelligence is godlike and can simulate the whole reachable universe, but it will solve it for most superintelligences.
I just want to flag that this approach seems to assume that—before we build the Oracle—we design the Oracle (or the procedure that produces it) such that it will assign prior of zero to the second types of worlds.
If we use some arbitrary scaled-up supervised learning training process to train a model that does well on general question answering, we can’t just safely sidestep the malign prior problem by providing information/instructions about the prior as part of the question. The simulations of the model that distant superintelligences run may involve such inputs as well. (In those simulations the loss may end up being minimal for whatever output the superintelligence wants the model to yield; regardless of the prescriptive information about the prior in the input.)
“Here are two worlds compatible with your observations. In one, you are part of a whole galaxy of stars stretching out for billions of lightyears, with all these particles moving according to the laws of physics. In the other, you are simulated within a another AI that produces the same observations. I want you to put a prior of zero on the second types of worlds.”
This won’t solve the issue if the superintelligence is godlike and can simulate the whole reachable universe, but it will solve it for most superintelligences.
I just want to flag that this approach seems to assume that—before we build the Oracle—we design the Oracle (or the procedure that produces it) such that it will assign prior of zero to the second types of worlds.
If we use some arbitrary scaled-up supervised learning training process to train a model that does well on general question answering, we can’t just safely sidestep the malign prior problem by providing information/instructions about the prior as part of the question. The simulations of the model that distant superintelligences run may involve such inputs as well. (In those simulations the loss may end up being minimal for whatever output the superintelligence wants the model to yield; regardless of the prescriptive information about the prior in the input.)