I agree with your intuition that an agent should be allowed to be bad at accomplishing its purpose.
To me the issue is that you’re leaving out self-awareness of the goal. That is, to me what makes an agent fully agentic is that it not only is trying to do something but it knows it is trying to do something. This creates a feedback loop within itself that helps keep it on target.
Many agentic-ish systems like RL systems sort of look like this, but the feedback loop that keeps them on target exists outside themselves and thus the agent is actually the RL system plus the human researchers running it. Or you have undirected systems like evolution that look sort of agentic but then don’t because you can “trick” them into doing things that are “not their purpose” because they don’t really have one, they just execute with no sense of purpose, even if their pattern of behavior is well predicted by modeling them as if they had goals.
Notice that I didn’t use the term agent, because I personally believe that goal-directedness and agency are distinct (though probably linked). So I agree with you’re intuition that an agent should probably know its goal, but I disagree with the proposal that it must do so to be goal-directed towards that goal.
That being said, I do agree that there is a difference in kind between the result of RL and an optimizer when following a goal. One way we (Joe Collman, Michele Campolo, Sabrina Tang and I) think about it is through the “source of directedness”: self-directed (like an optimizer), hardcoded in a direction (like an optimized system), or even self-directed with some constraint/initial direction (mesa-optimizer, that is an optimized optimizer).
they just execute with no sense of purpose, even if their pattern of behavior is well predicted by modeling them as if they had goals.
On that part I agree with Dennett that the definition of having a goal is for the pattern of behavior to be well predicted by modeling the system as having goals (taking the intentional stance). It seems you disagree, but maybe this boils down to the separation between goal-directedness and agency I pointed above.
I agree with your intuition that an agent should be allowed to be bad at accomplishing its purpose.
To me the issue is that you’re leaving out self-awareness of the goal. That is, to me what makes an agent fully agentic is that it not only is trying to do something but it knows it is trying to do something. This creates a feedback loop within itself that helps keep it on target.
Many agentic-ish systems like RL systems sort of look like this, but the feedback loop that keeps them on target exists outside themselves and thus the agent is actually the RL system plus the human researchers running it. Or you have undirected systems like evolution that look sort of agentic but then don’t because you can “trick” them into doing things that are “not their purpose” because they don’t really have one, they just execute with no sense of purpose, even if their pattern of behavior is well predicted by modeling them as if they had goals.
Notice that I didn’t use the term agent, because I personally believe that goal-directedness and agency are distinct (though probably linked). So I agree with you’re intuition that an agent should probably know its goal, but I disagree with the proposal that it must do so to be goal-directed towards that goal.
That being said, I do agree that there is a difference in kind between the result of RL and an optimizer when following a goal. One way we (Joe Collman, Michele Campolo, Sabrina Tang and I) think about it is through the “source of directedness”: self-directed (like an optimizer), hardcoded in a direction (like an optimized system), or even self-directed with some constraint/initial direction (mesa-optimizer, that is an optimized optimizer).
On that part I agree with Dennett that the definition of having a goal is for the pattern of behavior to be well predicted by modeling the system as having goals (taking the intentional stance). It seems you disagree, but maybe this boils down to the separation between goal-directedness and agency I pointed above.