So AFDT requires that the agent’s position is specified, in advance of it deciding on any policy or action. For Oracles, this shouldn’t be too hard—“yes, you are the Oracle in this box, at this time, answering this question—and if you’re not, behave as if you were the Oracle in this box, at this time, answering this question”.
I’m confused about his point. How do we “specify the position” of the Oracle? Suppose the Oracle is implemented as a supervised learning model. Its current input (and all the training data) could have been generated by an arbitrary distant superintelligence that is simulating the environment of the Oracle. What is special about “this box” (the box that we have in mind)? What privileged status does this particular box have relative to other boxes in similar environments that are simulated by distant superintelligences?
“Here are two worlds compatible with your observations. In one, you are part of a whole galaxy of stars stretching out for billions of lightyears, with all these particles moving according to the laws of physics. In the other, you are simulated within a another AI that produces the same observations. I want you to put a prior of zero on the second types of worlds.”
This won’t solve the issue if the superintelligence is godlike and can simulate the whole reachable universe, but it will solve it for most superintelligences.
I just want to flag that this approach seems to assume that—before we build the Oracle—we design the Oracle (or the procedure that produces it) such that it will assign prior of zero to the second types of worlds.
If we use some arbitrary scaled-up supervised learning training process to train a model that does well on general question answering, we can’t just safely sidestep the malign prior problem by providing information/instructions about the prior as part of the question. The simulations of the model that distant superintelligences run may involve such inputs as well. (In those simulations the loss may end up being minimal for whatever output the superintelligence wants the model to yield; regardless of the prescriptive information about the prior in the input.)
I’m confused about his point. How do we “specify the position” of the Oracle? Suppose the Oracle is implemented as a supervised learning model. Its current input (and all the training data) could have been generated by an arbitrary distant superintelligence that is simulating the environment of the Oracle. What is special about “this box” (the box that we have in mind)? What privileged status does this particular box have relative to other boxes in similar environments that are simulated by distant superintelligences?
“Here are two worlds compatible with your observations. In one, you are part of a whole galaxy of stars stretching out for billions of lightyears, with all these particles moving according to the laws of physics. In the other, you are simulated within a another AI that produces the same observations. I want you to put a prior of zero on the second types of worlds.”
This won’t solve the issue if the superintelligence is godlike and can simulate the whole reachable universe, but it will solve it for most superintelligences.
I just want to flag that this approach seems to assume that—before we build the Oracle—we design the Oracle (or the procedure that produces it) such that it will assign prior of zero to the second types of worlds.
If we use some arbitrary scaled-up supervised learning training process to train a model that does well on general question answering, we can’t just safely sidestep the malign prior problem by providing information/instructions about the prior as part of the question. The simulations of the model that distant superintelligences run may involve such inputs as well. (In those simulations the loss may end up being minimal for whatever output the superintelligence wants the model to yield; regardless of the prescriptive information about the prior in the input.)