jessicata comments on The best value indifference method (so far)

jessicata 15 Dec 2016 21:02 UTC
0 points
AF
Hmm… I’m finding that I’m unable to write down a simple shutdown problem in this framework (e.g. an environment where it should switch between maximizing paperclips and shutting down) to analyze what this algorithm does. To know what the algorithm does, I need to know what $P$ and $^P$ are (since these are parameters of the algorithm). From those I can derive $P^{'}$ and ${^P}^{'}$ to determine the agent’s action. But at the moment I have no way of proceeding, since I don’t know what $P$ and $^P$ are. Can you get me unstuck?
- Stuart_Armstrong 21 Dec 2016 16:22 UTC
  0 points
  AF Parent
  Suppose the humans have already decided whether to press the shutdown or order the AI to maximise paperclips. If $o_{s}$ is the observation of the shutdown command and $o_{p}$ the observation of the paperclip maximising command, and $u_{s}$ and $u_{p}$ the relevant utilities, then $P$ can be defined as $P (u_{s} | h_{m - 1} o_{s}) = 1$ and $P (u_{p} | h_{m - 1} o_{p}) = 1$ , for all histories $h_{m - 1}$ .
  
  Then define $ˆ P$ as the probability of $o_{s}$ versus $o_{p}$ , conditional on the fact that the agent follows a particular deterministic policy $π^{0}$ .
  
  If the agent does indeed follow $π^{0}$ , then $ˆ P = {ˆ P}^{'}$ . If it varies from this policy, then ${ˆ P}^{'}$ is altered in proportion to the expected change in $ˆ P$ caused by choosing a different action.