Stuart_Armstrong comments on Wireheading as a potential problem with the new impact measure

Stuart_Armstrong 28 Sep 2018 8:15 UTC
LW: 2 AF: 1
AF
Apologies for missing the intent verification part of your post.

But I don’t think it achieves what it sets out to do. Any action that doesn’t optimise $u_{A}$ can be roughly decomposed into a $u_{A}$ increasing part and a $u_{A}$ decreasing part (for instance, if $u_{A}$ is about making coffee, then making sure that the agent doesn’t crush the baby is a $u_{A}$ -cost).

Therefore, at a sufficient level of granularity, every non- $u_{A}$ optimal policy includes actions that decrease $u_{A}$ . Thus this approach cannot distinguish between 2) and 3).
- Rohin Shah 1 Oct 2018 2:45 UTC
  LW: 2 AF: 2
  AF Parent
  I was also confused by intent verification. The confusion went away after I figured out two things:
  - $u_{A}$ is not the same thing as $u_{A}^{''}$ .
  - Each action in the plan is compared to the baseline of doing nothing, not to the baseline of the optimal plan.
- TurnTrout 28 Sep 2018 13:31 UTC
  LW: 1 AF: 1
  AF Parent
  This isn’t true. Some suboptimal actions are also better than doing nothing. For example, if you don’t avoid crushing the baby, you might be shut off. Or, making one paperclip is better than nothing. There should still be “gentle” low impact granular u_A optimizing plans that aren’t literally the max impact u_A optimal plan.
  
  To what extent this holds is an open question. Suggestions on further relaxing IV are welcome.
  - Stuart_Armstrong 1 Oct 2018 12:41 UTC
    LW: 3 AF: 2
    AF Parent
    For example, if you don’t avoid crushing the baby, you might be shut off.
    
    In that case, avoiding the baby is the optimal decision, not suboptimal.
    
    Or, making one paperclip is better than nothing.
    
    PM (Paperclip Machine): Insert number of paperclips to be made. A: 1. PM: Are you sure you don’t want to make any more paperclips Y/N? A: Y.
    
    Then “Y” is clearly a suboptimal action from the paperclip making perspective. Contrast:
    
    PM: Are you sure you don’t want me to wirehead you to avoid the penalty Y/N? A: Y.
    
    Now, these two examples seem a bit silly; if you want, we could discuss it more, and try and refine what is different about it. But my main two arguments are:
    
    Any suboptimal policy, if we look at it in a granular enough way (or replace it with an equivalent policy/environment, and look at that in granular enough way) will include individual actions that are suboptimal (eg not budgeting more energy for the paperclip machine than is needed to make one paperclip).
    In consequence, IV does not distinguish between wireheading and other limited-impact not-completely-optimal policies.
    
    Would you like to Skype or PM to resolve this issue?
    - TurnTrout 1 Oct 2018 13:53 UTC
      1 point
      Parent
      Sure, let’s do that!