EJT comments on What’s Hard About The Shutdown Problem

EJT 1 Nov 2023 10:37 UTC
3 points
0
I’ve been imagining that the button is shutdown-causing for simplicity, but I think you can suppose instead that the button is shutdown-requesting (i.e. agent receives a signal indicating that button has been pressed but still gets to choose whether to shut down) without affecting the points above. You’d just need to append a first step to the training procedure: training the agent to prefer shutting down when they receive the signal.