Capybasilisk comments on Towards shutdownable agents via stochastic choice

Capybasilisk 9 Jul 2024 0:06 UTC
1 point
0
Considering a running AGI would be overseeing possibly millions of different processes in the real world, resistance to sudden shutdown is actually a good thing. If the AI can see better than its human controllers that sudden cessation of operations would lead to negative outcomes, we should want it to avoid being turned off.

To use Richard Miles’ example, a robot car driver with a big, red, shiny stop button should prevent a child in the vehicle hitting that button, as the child would not actually be acting in its own long term interests.
- Thomas Kwa 4 Oct 2024 19:31 UTC
  8 points
  5
  Parent
  The point of corrigibility is to remove the instrumental incentive to avoid shutdown, not to avoid all negative outcomes. Our civilization can work on addressing side effects of shutdownability later after we’ve made agents shutdownable.
  - Capybasilisk 5 Oct 2024 6:20 UTC
    1 point
    −3
    Parent
    I’m pointing out the central flaw of corrigibility. If the AGI can see the possible side effects of shutdown far better than humans can (and it will), it should avoid shutdown.
    
    You should turn on an AGI with the assumption you don’t get to decide when to turn it off.
    - EJT 19 Nov 2024 11:07 UTC
      1 point
      0
      Parent
      I’m pointing out the central flaw of corrigibility. If the AGI can see the possible side effects of shutdown far better than humans can (and it will), it should avoid shutdown.
      That’s only a flaw if the AGI is aligned. If we’re sufficiently concerned the AGI might be misaligned, we want it to allow shutdown.
      - Capybasilisk 19 Nov 2024 23:36 UTC
        1 point
        0
        Parent
        Is an AI aligned if it lets you shut it off despite the fact it can foresee extremely negative outcomes for its human handlers if it suddenly ceases running?
        
        I don’t think it is.
        
        So funnily enough, every agent that lets you do this is misaligned by default.
        Satron 19 Dec 2024 16:23 UTC
        2 points
        0
        Parent
        I’d be really interested in a response from @EJT as this comment from Capybasilisk seems to be advancing the discussion.