Karl comments on Introducing Corrigibility (an FAI research subfield)

Karl 24 Oct 2014 21:12 UTC
5 points
Firstly, the important part of my modification to the indifference formalism is not about conditioning on the actual o but it’s the fact that in evaluating the expectation of UN it take the action in A2 (for a given pair (a1,o)) which maximize UN instead of the action which maximize U (note that U is equal to US in the case that o is not in Press.).

Secondly an agent which chose a1 by simply maximizing E[UN | NotPress; a1] + E[US | Press; a1] do exhibit pathological behaviors. In partcular, there will still be incentives to manage the news, but from both sides now (there is an incentive to cause the button to be pressed in the event of an information which is bad news from the point of view of UN and incentives to cause the button to not be pressed in the events of information which is bad news from the point of view of US.
- lackofcheese 24 Oct 2014 23:13 UTC
  4 points
  Parent
  I think this means “indifference” isn’t really the right term any more, because the agent is not actually indifferent between the two sets of observations, and doesn’t really need to be.
  
  So, how about U(a1, o, a2) =
  UN(a1, o, a2) + max_b(US(a1, o, b)), if o is not in Press
  US(a1, o, a2) + max_b(UN(a1, o, b)), if o is in Press
  
  or, in your notation, U(a1, o, a2) = g(a1, o) + UN(a1, o, a2) if o is in Press, or US(a1, o, a2) + f(a1, o) if o is in Press.
- lackofcheese 24 Oct 2014 23:00 UTC
  2 points
  Parent
  OK, you’re right on that point; I misunderstood the “managing the news” problem because I hadn’t quite realised that it was about shifting observations between the Press/NotPress sets. As you’ve said, the only resolution is to select a1 based on
  E[max_b(UN(a1, O, b) | O; a1]
  and not
  E[max_b(UN(a1, O, b) | O not in Press; a1]