Nothing new here, just carrying on explaining my understanding in case it helps others:
Following on from (2): in the simple case where the AI can ask the advisor or not, we want the expected utility after asking to also be used to evaluate the case where the AI doesn’t ask. i.e.
E[p(C=u1 | A=”don’t ask”)] := E[p(C=u_1 | A=”ask”] (:= is assignment; C is the correct utility function)
So we’ll renormalise the probability of each utility function in the “don’t ask” scenario.
A more complex case arises where there multiple actions cause changes in the utility function, e.g. if there are a bunch of different advisors. In these more complex cases, it’s not so useful to think about a direction of assignment. The more useful model for what’s going on is that the agent must have a distribution over C that is updated when it gets a different model of what the advisors will say.
Basically, requiring the agent to update its distribution over utility functions in a way that obeys the axioms of probability will prevent the agent from sliding toward the utility functions that are easiest to fulfil.
Nothing new here, just carrying on explaining my understanding in case it helps others:
Following on from (2): in the simple case where the AI can ask the advisor or not, we want the expected utility after asking to also be used to evaluate the case where the AI doesn’t ask. i.e.
E[p(C=u1 | A=”don’t ask”)] := E[p(C=u_1 | A=”ask”] (:= is assignment; C is the correct utility function)
So we’ll renormalise the probability of each utility function in the “don’t ask” scenario.
A more complex case arises where there multiple actions cause changes in the utility function, e.g. if there are a bunch of different advisors. In these more complex cases, it’s not so useful to think about a direction of assignment. The more useful model for what’s going on is that the agent must have a distribution over C that is updated when it gets a different model of what the advisors will say.
Basically, requiring the agent to update its distribution over utility functions in a way that obeys the axioms of probability will prevent the agent from sliding toward the utility functions that are easiest to fulfil.