CBiddulph comments on A Shutdown Problem Proposal

CBiddulph 22 Jan 2024 21:06 UTC
3 points
0

First and most important, there’s the choice of “default action”. We probably want the default action to be not-too-bad by the human designers’ values; the obvious choice is a “do nothing” action. But then, in order for the AI to do anything at all, the “shutdown” utility function must somehow be able to do better than the “do nothing” action. Otherwise, that subagent would just always veto and be quite happy doing nothing.

Can we solve this problem by setting the default action to “do nothing,” then giving the agent an extra action to “do nothing and give the shutdown subagent +1 reward?”