I usually think of this in terms of Dennett’s concept of the intentional stance, according to which there is no fact of the matter of whether something is an agent or not. But there is a fact of the matter of whether we can usefully predict its behavior by modeling it as if it was an agent with some set of beliefs and goals.
For example, even though the calculations of a chess-playing computer have practically nothing in common with human thought, its moves can still be effectively predicted by assuming that it “wants” to win at chess and “knows” the rules of chess. This gives rise to the prediction that it will always choose, from the list of viable moves, one which best furthers the goal of winning the game. Even though the best move may not be obvious, adopting the intentional stance still allows the human observer to improve on their predictions of what the computer would do, by eliminating obvious bad moves.
There is no observer-independent “fact of the matter” of whether a system is or is not an “agent”. However, there is an objective fact of the matter about how well-modeled a particular system’s behavior is modeled by the intentional stance, from the point of view of a given observer. There are, objectively, patterns in the observable behavior of an intentional system that correspond to what we call “beliefs” and “desires”, and these patterns explain or predict the behavior of the system unusually well (but not perfectly) for how simple they are. [...]
There are several approaches one might take to predicting the future behavior of some system; Dennett compares three: the physical stance, the design stance, and the intentional stance.
In adopting the physical stance towards a system, you utilize an understanding of the laws of physics to predict a system’s behavior from its physical constitution and its physical interactions with its environment. One simple example of a situation where the physical stance is most useful is in predicting the trajectory of a rock sliding down a slope; one would be able to get very precise and accurate predictions with knowledge of the laws of motion, gravitation, friction, etc. In principle (and presuming physicalism), this stance is capable of predicting in full the behavior of everything from quantum mechanical systems to human beings to the entire future of the whole universe.
With the design stance, by contrast, “one ignores the actual (possibly messy) details of the physical constitution of an object, and, on the assumption that it has a certain design, predicts that it will behave as it is designed to behave under various circumstances.” For example, humans almost never consider what their computers are doing on a physical level, unless something has gone wrong; by default, we operate on the level of a user interface, which was designed in order to abstract away messy details that would otherwise hamper our ability to interact with the systems.
Finally, there’s the intentional stance:
Here is how it works: first you decide to treat the object whose behavior is to be predicted as a rational agent; then you figure out what beliefs that agent ought to have, given its place in the world and its purpose. Then you figure out what desires it ought to have, on the same considerations, and finally you predict that this rational agent will act to further its goals in the light of its beliefs. A little practical reasoning from the chosen set of beliefs and desires will in many—but not all—instances yield a decision about what the agent ought to do; that is what you predict the agent will do.
Before further unpacking the intentional stance, one helpful analogy might be that the three stances can be understood as providing gears-level models for the system under consideration, at different levels of abstraction.[2] For purposes of illustration, imagine we want to model the behavior of a housekeeping robot:
The physical stance gives us a gears-level model where the gears are the literal gears (or other physical components) of the robot.
The design stance gives us a gears-level model where the gears come from the level of abstraction at which the system was designed. The gears could be e.g. the CPU, memory, etc., on the hardware side, or on the level of the robot’s user interface, on the software side.
The intentional stance gives us a gears-level model where the relevant gears are the robot’s beliefs, desires, goals, etc. [...]
Now that he’s described how we attribute beliefs and desires to systems that seem to us to have intentions of one kind or another, “the next task would seem to be distinguishing those intentional systems that really have beliefs and desires from those we may find it handy to treat as if they had beliefs and desires.” (For example, although a thermostat’s behavior can be understood under the intentional stance, most people intuitively feel that a thermostat doesn’t “really” have beliefs.) This, however, cautions Dennett, would be a mistake.
As a thought experiment, Dennett asks us to imagine that some superintelligent Martians descend upon us; to them, we’re as simple as thermostats are to us. If they were capable of predicting the activities of human society on a microphysical level, without ever treating any of us as intentional systems, it seems fair to say that we wouldn’t “really” be believers, to them. This shows that intentionality is somewhat observer-relative—whether or not a system has intentions depends on the modeling capabilities of the observer.
However, this is not to say that intentionality is completely subjective, far from it—there are objective patterns in the observables corresponding to what we call “beliefs” and “desires.” (Although Dennett is careful to emphasize that these patterns don’t allow one to perfectly predict behavior; it’s that they predict the data unusually well for how simple they are. For one, your ability to model an intentional system will fail under certain kinds of distributional shifts; analogously, understanding a computer under the design stance does not allow one to make accurate predictions about what it will do when submerged in liquid helium.) [...]
If something appears agent-y to us (i.e., we intuitively use the intentional strategy to describe its behavior), our next question tends to be, “but is it really an agent?” (It’s unclear what exactly is meant by this question in general, but it might be interpreted as asking whether some parts of the system correspond to explicit representations of beliefs and/or desires.) In the context of AI safety, we often talk about whether or not the systems we build “will or won’t be agents,” whether or not we should build agents, etc.
One of Dennett’s key messages with the intentional stance is that this is a fundamentally confused question. What it really and truly means for a system to “be an agent” is that its behavior is reliably predictable by the intentional strategy; all questions of internal cognitive or mechanistic implementation of such behavior are secondary. (Put crudely, if it looks to us like an agent, and we don’t have an equally-good-or-better alternative for understanding that system’s behavior, well, then it is one.) In fact, once you have perfectly understood the internal functional mechanics of a system that externally appears to be an agent (i.e. you can predict its behavior more accurately than with the intentional stance, albeit with much more information), that system stops looking like “an agent,” for all intents and purposes. (At least, modeling the system as such becomes only one potential model for understanding the system’s behavior, which you might still use in certain contexts e.g. for efficient inference or real-time action.)
We should therefore be more careful to recognize that the extent to which AIs will “really be agents” is just the extent to which our best model of their behavior is of them having beliefs, desires, goals, etc. If GPT-N appears really agent-y with the right prompting, and we can’t understand this behavior under the design stance (how it results from predicting the most likely continuation of the prompt, given a giant corpus of internet text) or a “mechanistic” stance (how individual neurons, small circuits, and/or larger functional modules interacted to produce the output), then GPT-N with that prompting really is an agent.
I usually think of this in terms of Dennett’s concept of the intentional stance, according to which there is no fact of the matter of whether something is an agent or not. But there is a fact of the matter of whether we can usefully predict its behavior by modeling it as if it was an agent with some set of beliefs and goals.
I find it difficult to believe that there can be no objective criteria for recognising agency when there are objective criteria for building agents.
If you are willing to countenance counterfactuals, it’s possible to get more rigourous about “seems like an agent”. A system is goal-driven if it would have displayed different circumstances to achieve the same goal, IE. It avoids obstacles. A system has a utility function if there is part of the system you can change to achieve different goals, in the preceding sense.
I usually think of this in terms of Dennett’s concept of the intentional stance, according to which there is no fact of the matter of whether something is an agent or not. But there is a fact of the matter of whether we can usefully predict its behavior by modeling it as if it was an agent with some set of beliefs and goals.
For example, even though the calculations of a chess-playing computer have practically nothing in common with human thought, its moves can still be effectively predicted by assuming that it “wants” to win at chess and “knows” the rules of chess. This gives rise to the prediction that it will always choose, from the list of viable moves, one which best furthers the goal of winning the game. Even though the best move may not be obvious, adopting the intentional stance still allows the human observer to improve on their predictions of what the computer would do, by eliminating obvious bad moves.
That sounds awfully lot like asserting agency to be a mind-projecting fallacy.
That seems maybe true. What’s the problem you see with that?
I find it difficult to believe that there can be no objective criteria for recognising agency when there are objective criteria for building agents.
If you are willing to countenance counterfactuals, it’s possible to get more rigourous about “seems like an agent”. A system is goal-driven if it would have displayed different circumstances to achieve the same goal, IE. It avoids obstacles. A system has a utility function if there is part of the system you can change to achieve different goals, in the preceding sense.