Stuart_Armstrong comments on A probabilistic off-switch that the agent is indifferent to

Stuart_Armstrong 25 Sep 2018 19:11 UTC
LW: 4 AF: 2
AF

this approach fails gracefully

There still might be problems with subagents, though. It could be optimal for the agent to create a subagent to protect it from being interfered with, while it “goes to sleep”.
What links here?
- A probabilistic off-switch that the agent is indifferent to by Ofer (25 Sep 2018 13:13 UTC; 11 points)
- Ofer's comment on A probabilistic off-switch that the agent is indifferent to by Ofer (26 Sep 2018 11:27 UTC; 1 point)
- Ofer 25 Sep 2018 20:57 UTC
  LW: 3 AF: 2
  AF Parent
  I agree.
  I think this might be solved by modifying the utility for the case $f^{- 1} (y) \neq 0$ to:
  $\frac{α}{1 + [number of time-steps until the first "self-terminate" action]}$
  - Stuart_Armstrong 25 Sep 2018 21:39 UTC
    LW: 5 AF: 3
    AF Parent
    Yep, that’s better. There’s still the risk of subagents being created—when the agent thinks that $f^{- 1} (y) \neq 0$ , almost certainly, but not completely certainly. Then it might create a $u$ -maximising subagent and then self-terminate.
    
    That means that this design, like most indifference designs, is reflectively consistent but not reflectively stable.
    What links here?
    A probabilistic off-switch that the agent is indifferent to by Ofer (25 Sep 2018 13:13 UTC; 11 points)
    Ofer's comment on A probabilistic off-switch that the agent is indifferent to by Ofer (26 Sep 2018 11:27 UTC; 1 point)
    - Ofer 25 Sep 2018 22:31 UTC
      LW: 5 AF: 3
      AF Parent
      Wow, I agree!
      Let us modify the utility for the case $f^{- 1} (y) = 0$ to:
      $u^{*} (h) = {\begin{matrix} 0 & h contains "self-terminate" action u (h) & otherwise \end{matrix}$
      Meaning: no utility can be gained via subagents if the agent “jumps ship” (i.e. self-terminates to gain utility in case $f^{- 1} (y) \neq 0$ ).
      - Stuart_Armstrong 26 Sep 2018 8:44 UTC
        LW: 3 AF: 2
        AF Parent
        Interesting. I’ll think of whether this works and can be generalised (it doesn’t make it reflectively stable—creating u-maximising subagents is still allowed, and doesn’t directly hurt the agent—but might improve the situation).
        
        What links here?
        A probabilistic off-switch that the agent is indifferent to by Ofer (25 Sep 2018 13:13 UTC; 11 points)