I don’t know. Thornley’s proposal got me thinking about subagents as a tool for corrigibility, but I never understood his properties well enough to say how his subagents relate to the counterfactual-optimizing agents in this proposal.
I don’t know. Thornley’s proposal got me thinking about subagents as a tool for corrigibility, but I never understood his properties well enough to say how his subagents relate to the counterfactual-optimizing agents in this proposal.