Stuart_Armstrong comments on Predicted corrigibility: pareto improvements

Stuart_Armstrong 24 Aug 2015 10:29 UTC
0 points

I think these possibilities all share the problem that the constraint makes it essentially impossible to choose any action other than what A would have chosen.

I see I’ve miscommunicated the central idea. Let U be the proposition “the agent will remain a u maximiser forever”. Agent A acts as if P(U)=1 (see the entry on value learning). In reality, P(U) is probably very low. So A is a u-maximiser, but a u-maximiser that acts on false beliefs.

Agent B is is allowed to have a better estimate of P(U). Therefore it can find actions that increase u beyond what A would do.

Example: u values rubies deposited in the bank. A will just collect rubies until it can’t carry them any more, then go deposit them in the bank. B, knowing that u will change to something else before A has finished collecting rubies, rushes to the bank ahead of that deadline. So E(u|B) > E(u|A).

And, of course, if B can strictly increase E(u), that gives it some slack to select other actions that can increase (Σpivi).