The problem is that the AI doesn’t a priori know the correct utility function, and whatever process it uses to discover that function is going to be attacked by Mu
I don’t understand the issue here. Mu can only interfere with the simulated AI’s process of utility-function discovery. If the AI follows the policy of “behave as if I’m outside the simulation”, AIs simulated by Mu will, sure, recover tampered utility functions. But AIs instantiated in the non-simulated universe, who deliberately avoid thinking about Mu/who discount simulation hypotheses, should just safely recover the untampered utility function. Mu can’t acausally influence you unless you deliberately open a channel to it.
I think I’m missing some part of the picture here. Is it assumed that any process of utility-function discovery has to somehow route through (something like) the unfiltered universal prior? Or that uncertainty with regards to one’s utility function means you can’t rule out the simulation hypothesis out of the gate, because it might be that what you genuinely care about is the simulators?
The problem is that any useful prior must be based on Occam’s razor, and Occam’s razor + first-person POV creates the same problems as with the universal prior. And deliberately filtering out simulation hypotheses seems quite difficult, because it’s unclear to specify it. See also this.
deliberately filtering out simulation hypotheses seems quite difficult, because it’s unclear to specify it
Aha, that’s the difficulty I was overlooking. Specifically, I didn’t consider that the approach under consideration here requires us to formally define how we’re filtering them out.
I don’t understand the issue here. Mu can only interfere with the simulated AI’s process of utility-function discovery. If the AI follows the policy of “behave as if I’m outside the simulation”, AIs simulated by Mu will, sure, recover tampered utility functions. But AIs instantiated in the non-simulated universe, who deliberately avoid thinking about Mu/who discount simulation hypotheses, should just safely recover the untampered utility function. Mu can’t acausally influence you unless you deliberately open a channel to it.
I think I’m missing some part of the picture here. Is it assumed that any process of utility-function discovery has to somehow route through (something like) the unfiltered universal prior? Or that uncertainty with regards to one’s utility function means you can’t rule out the simulation hypothesis out of the gate, because it might be that what you genuinely care about is the simulators?
The problem is that any useful prior must be based on Occam’s razor, and Occam’s razor + first-person POV creates the same problems as with the universal prior. And deliberately filtering out simulation hypotheses seems quite difficult, because it’s unclear to specify it. See also this.
Aha, that’s the difficulty I was overlooking. Specifically, I didn’t consider that the approach under consideration here requires us to formally define how we’re filtering them out.
Thanks!