I’m exploring the idea of agency roughly as a certain tendency to adaptively force a range of prioritized outcomes.
In this conception, having a “terminal goal” is just a special and unusual subcase in which there is one single specific outcome at which the agent is driving with full intensity. To maintain that state, one of its subgoals must be to maintain the integrity of its current goal-prioritization state.
More commonly, however, even an AI with superhuman capabilities will prioritize multiple outcomes, with varied degress of intensity, exhibiting only a moderate level of protection over its goal structure. Any goal-seeking adaptive behaviors it shows will be the result of careful engineering by its trainers. Passive incoherence is the default and it will take work to force an AI to exhibit a specific and durable goal structure.
I’m exploring the idea of agency roughly as a certain tendency to adaptively force a range of prioritized outcomes.
In this conception, having a “terminal goal” is just a special and unusual subcase in which there is one single specific outcome at which the agent is driving with full intensity. To maintain that state, one of its subgoals must be to maintain the integrity of its current goal-prioritization state.
More commonly, however, even an AI with superhuman capabilities will prioritize multiple outcomes, with varied degress of intensity, exhibiting only a moderate level of protection over its goal structure. Any goal-seeking adaptive behaviors it shows will be the result of careful engineering by its trainers. Passive incoherence is the default and it will take work to force an AI to exhibit a specific and durable goal structure.