If all you want is something like a shutdown button, then a timer is a good and probably-simpler way to achieve it. It does still run into many of the same general issues (various ontological issues, effects of other agents in the environment, how to design the “shutdown utility function”, etc) but it largely sidesteps the things which are confusing about corrigibility specifically.
The flip side is that, because it sidesteps the things which are confusing about corrigibility specifically, it doesn’t offer much insight on how to tackle more general problems of corrigibility, beyond just the shutdown problem.
any thoughts on davidad’s shutdown timer?
If all you want is something like a shutdown button, then a timer is a good and probably-simpler way to achieve it. It does still run into many of the same general issues (various ontological issues, effects of other agents in the environment, how to design the “shutdown utility function”, etc) but it largely sidesteps the things which are confusing about corrigibility specifically.
The flip side is that, because it sidesteps the things which are confusing about corrigibility specifically, it doesn’t offer much insight on how to tackle more general problems of corrigibility, beyond just the shutdown problem.