Firstly, the important part of my modification to the indifference formalism is not about conditioning on the actual o but it’s the fact that in evaluating the expectation of UN it take the action in A2 (for a given pair (a1,o)) which maximize UN instead of the action which maximize U (note that U is equal to US in the case that o is not in Press.).
Secondly an agent which chose a1 by simply maximizing E[UN | NotPress; a1] + E[US | Press; a1] do exhibit pathological behaviors. In partcular, there will still be incentives to manage the news, but from both sides now (there is an incentive to cause the button to be pressed in the event of an information which is bad news from the point of view of UN and incentives to cause the button to not be pressed in the events of information which is bad news from the point of view of US.
I think this means “indifference” isn’t really the right term any more, because the agent is not actually indifferent between the two sets of observations, and doesn’t really need to be.
So, how about
U(a1, o, a2) = UN(a1, o, a2) + max_b(US(a1, o, b)), if o is not in Press US(a1, o, a2) + max_b(UN(a1, o, b)), if o is in Press
or, in your notation,
U(a1, o, a2) = g(a1, o) + UN(a1, o, a2) if o is in Press, or US(a1, o, a2) + f(a1, o) if o is in Press.
OK, you’re right on that point; I misunderstood the “managing the news” problem because I hadn’t quite realised that it was about shifting observations between the Press/NotPress sets. As you’ve said, the only resolution is to select a1 based on E[max_b(UN(a1, O, b) | O; a1] and not E[max_b(UN(a1, O, b) | O not in Press; a1]
Firstly, the important part of my modification to the indifference formalism is not about conditioning on the actual o but it’s the fact that in evaluating the expectation of UN it take the action in A2 (for a given pair (a1,o)) which maximize UN instead of the action which maximize U (note that U is equal to US in the case that o is not in Press.).
Secondly an agent which chose a1 by simply maximizing E[UN | NotPress; a1] + E[US | Press; a1] do exhibit pathological behaviors. In partcular, there will still be incentives to manage the news, but from both sides now (there is an incentive to cause the button to be pressed in the event of an information which is bad news from the point of view of UN and incentives to cause the button to not be pressed in the events of information which is bad news from the point of view of US.
I think this means “indifference” isn’t really the right term any more, because the agent is not actually indifferent between the two sets of observations, and doesn’t really need to be.
So, how about U(a1, o, a2) =
UN(a1, o, a2) + max_b(US(a1, o, b)), if o is not in Press
US(a1, o, a2) + max_b(UN(a1, o, b)), if o is in Press
or, in your notation, U(a1, o, a2) = g(a1, o) + UN(a1, o, a2) if o is in Press, or US(a1, o, a2) + f(a1, o) if o is in Press.
OK, you’re right on that point; I misunderstood the “managing the news” problem because I hadn’t quite realised that it was about shifting observations between the Press/NotPress sets. As you’ve said, the only resolution is to select a1 based on
E[max_b(UN(a1, O, b) | O; a1]
and not
E[max_b(UN(a1, O, b) | O not in Press; a1]