mavant comments on Predicted corrigibility: pareto improvements

mavant 23 Aug 2015 19:46 UTC
0 points
Third obvious possibility: B maximises u~Σpivi, subject to the constraints E(Σpivi|B) ≥ E(Σpivi|A) and E(u|B) ≥ E(u|A). where ~ is some simple combining operation like addition or multiplication, or “the product of A and B divided by the sum of A and B”.

I think these possibilities all share the problem that the constraint makes it essentially impossible to choose any action other than what A would have chosen. If A chose the action that maximized u, then B cannot choose any other action while satisfying the constraint E(u|B) ≥ E(u|A) unless there were multiple actions that had the exact same payoff (which seems unlikely if payoff values are distributed over the reals, rather than over a finite set). And the first possibility (to maximize u while respecting E(Σpivi|B) ≥ E(Σpivi|A) ) just results in choosing the exact same action as A would have chosen, even if there’s another action that has an identical E(u) AND higher E(Σpivi).
- Stuart_Armstrong 24 Aug 2015 10:29 UTC
  0 points
  Parent
  
  I think these possibilities all share the problem that the constraint makes it essentially impossible to choose any action other than what A would have chosen.
  
  I see I’ve miscommunicated the central idea. Let U be the proposition “the agent will remain a u maximiser forever”. Agent A acts as if P(U)=1 (see the entry on value learning). In reality, P(U) is probably very low. So A is a u-maximiser, but a u-maximiser that acts on false beliefs.
  
  Agent B is is allowed to have a better estimate of P(U). Therefore it can find actions that increase u beyond what A would do.
  
  Example: u values rubies deposited in the bank. A will just collect rubies until it can’t carry them any more, then go deposit them in the bank. B, knowing that u will change to something else before A has finished collecting rubies, rushes to the bank ahead of that deadline. So E(u|B) > E(u|A).
  
  And, of course, if B can strictly increase E(u), that gives it some slack to select other actions that can increase (Σpivi).