I think the initial (2-agent) model only has two time steps, ie one opportunity for the button to be pressed. The goal is just for the agent to be corrigible for this single button-press opportunity.
I think the initial (2-agent) model only has two time steps, ie one opportunity for the button to be pressed. The goal is just for the agent to be corrigible for this single button-press opportunity.