In addition, there are powerful shutdownable systems: KataGo can beat me at Go but doesn’t prevent itself from being shut down for instrumental reasons, whereas humans generally will.
KataGo seems to be a system that is causally downstream of a process that has made it good at Go. To attempt to prevent itself from being shut down, KataGo would need to have some model of what it means to be ‘shut down’.
Comparing KataGo to humans when it comes to shutdownability is evidence of confusion.
KataGo seems to be a system that is causally downstream of a process that has made it good at Go. To attempt to prevent itself from being shut down, KataGo would need to have some model of what it means to be ‘shut down’.
Comparing KataGo to humans when it comes to shutdownability is evidence of confusion.