johnswentworth comments on A Shutdown Problem Proposal

johnswentworth 22 Jan 2024 19:43 UTC
2 points
0
I don’t know. Thornley’s proposal got me thinking about subagents as a tool for corrigibility, but I never understood his properties well enough to say how his subagents relate to the counterfactual-optimizing agents in this proposal.