Signer answers What is wrong with this approach to corrigibility?

Signer 13 Jul 2022 0:12 UTC
3 points
1
AI will press the button itself?
- Rafael Cosman 17 Jul 2022 1:08 UTC
  3 points
  0
  Parent
  If implemented as described, the AI should be exactly indifferent to pushing the button? I guess the AI’s behavior in that situation is not well defined… and if we make the button give expected value minus epsilon reward, then the AI might kill you to stop you from pressing the button (because it wants that epsilon reward!)
  
  So overall I suppose this is a fair criticism of the approach and is possibly what Paul means by issues with precisely balancing!