In that case, what does the conditional goal look like when you translate it into a preference relation over outcomes?
We can’t reduce the domain of the utility function without destroying some information. If we tried to change the domain variables from [g, h, shutdown] to [g, shutdown], we wouldn’t get the desired behaviour. Maybe you have a particular translation method in mind?
I don’t mess up the medical test because true information is instrumentally useful to me, given my goals.
Yep that’s what I meant. The goal u is constructed to make information about h instrumentally useful for achieving u, even ifg is poorly specified. The agent can prefer h over ~h or vice versa, just as we prefer a particular outcome of a medical test. But because of the instrumental (information) value of the test, we don’t interfere with it.
I think the utility indifference genre of solutions (which try to avoid preferences between shutdown and not-shutdown) are unnatural and create other problems. My approach allows the agent to shutdown even if it would prefer to be in the non-shutdown world.
We can’t reduce the domain of the utility function without destroying some information. If we tried to change the domain variables from [g, h, shutdown] to [g, shutdown], we wouldn’t get the desired behaviour. Maybe you have a particular translation method in mind?
Yep that’s what I meant. The goal
u
is constructed to make information abouth
instrumentally useful for achievingu
, even ifg
is poorly specified. The agent can preferh
over~h
or vice versa, just as we prefer a particular outcome of a medical test. But because of the instrumental (information) value of the test, we don’t interfere with it.I think the utility indifference genre of solutions (which try to avoid preferences between shutdown and not-shutdown) are unnatural and create other problems. My approach allows the agent to shutdown even if it would prefer to be in the non-shutdown world.