In the sense of moving a system towards many possible goals? But I think in a more appropriate space (where the aiming should take place) it’s again an attractor. Corrigibility is not a goal, a corrigible system doesn’t necessarily have any well-defined goals, traditional goal-directed agents can’t be corrigible in a robust way, and it should be possible to use it for corrigibility towards corrigibility, making this aspect stronger if that’s what the operators work towards happening.
More generally, non-agentic aspects of behavior can systematically reinforce non-agentic character of each other, preventing any opposing convergent drives (including the drive towards agency) from manifesting if they’ve been set up to do so. Sufficient intelligence/planning advantage pushes this past exploitability hazards, repelling selection theorems, even as some of the non-agentic behaviors might be about maintaining specific forms of exploitability.
In the sense of moving a system towards many possible goals? But I think in a more appropriate space (where the aiming should take place) it’s again an attractor. Corrigibility is not a goal, a corrigible system doesn’t necessarily have any well-defined goals, traditional goal-directed agents can’t be corrigible in a robust way, and it should be possible to use it for corrigibility towards corrigibility, making this aspect stronger if that’s what the operators work towards happening.
More generally, non-agentic aspects of behavior can systematically reinforce non-agentic character of each other, preventing any opposing convergent drives (including the drive towards agency) from manifesting if they’ve been set up to do so. Sufficient intelligence/planning advantage pushes this past exploitability hazards, repelling selection theorems, even as some of the non-agentic behaviors might be about maintaining specific forms of exploitability.