This literature review on goal-directedness identifies five different properties that should be true for a system to be described as goal-directed:
1. **Restricted space of goals:** The space of goals should not be too expansive, since otherwise goal-directedness can <@become vacuous@>(@Coherence arguments do not imply goal-directed behavior@) (e.g. if we allow arbitrary functions over world-histories with no additional assumptions).
2. **Explainability:** A system should be described as goal-directed when doing so improves our ability to _explain_ the system’s behavior and _predict_ what it will do.
3. **Generalization:** A goal-directed system should adapt its behavior in the face of changes to its environment, such that it continues to pursue its goal.
4. **Far-sighted:** A goal-directed system should consider the long-term consequences of its actions.
5. **Efficient:** The more goal-directed a system is, the more efficiently it should achieve its goal.
The concepts of goal-directedness, optimization, and agency seem to have significant overlap, but there are differences in the ways the terms are used. One common difference is that goal-directedness is often understood as a _behavioral_ property of agents, whereas optimization is thought of as a _mechanistic_ property about the agent’s internal cognition.
The authors then compare multiple proposals on these criteria:
1. The _intentional stance_ says that we should model a system as goal-directed when it helps us better explain the system’s behavior, performing well on explainability and generalization. It could easily be extended to include far-sightedness as well. A more efficient system for some goal will be easier to explain via the intentional stance, so it does well on that criterion too. And not every possible function can be a goal, since many are very complicated and thus would not be better explanations of behavior. However, the biggest issue is that the intentional stance cannot be easily formalized.
2. One possible formalization of the intentional stance is to say that a system is goal-directed when we can better explain the system’s behavior as maximizing a specific utility function, relative to explaining it using an input-output mapping (see <@Agents and Devices: A Relative Definition of Agency@>). This also does well on all five criteria.
3. <@AGI safety from first principles@> proposes another set of criteria that have a lot of overlap with the five criteria above.
Thanks for the inclusion in the newsletter and the opinion! (And sorry for taking so long to answer)
This literature review on goal-directedness identifies five different properties that should be true for a system to be described as goal-directed:
It’s implicit, but I think it should be made explicit that the properties/tests are what we extract from the literature, not what we say is fundamental. More specifically, we don’t say they should be true per se, we just extract and articulate them to “force” a discussion of them when defining goal-directedness.
One common difference is that goal-directedness is often understood as a _behavioral_ property of agents, whereas optimization is thought of as a _mechanistic_ property about the agent’s internal cognition.
I don’t think the post says that. We present both a behavioral (à la Dennett) view of goal-directedness as well as an internal property one (like what Evan or Richard discuss); same for the two forms of optimization considered.
It’s implicit, but I think it should be made explicit that the properties/tests are what we extract from the literature, not what we say is fundamental.
Changed to “This post extracts five different concepts that have been identified in the literature as properties of goal-directed systems:”.
Planned summary for the Alignment Newsletter:
Planned opinion:
Thanks for the inclusion in the newsletter and the opinion! (And sorry for taking so long to answer)
It’s implicit, but I think it should be made explicit that the properties/tests are what we extract from the literature, not what we say is fundamental. More specifically, we don’t say they should be true per se, we just extract and articulate them to “force” a discussion of them when defining goal-directedness.
I don’t think the post says that. We present both a behavioral (à la Dennett) view of goal-directedness as well as an internal property one (like what Evan or Richard discuss); same for the two forms of optimization considered.
Changed to “This post extracts five different concepts that have been identified in the literature as properties of goal-directed systems:”.
Deleted that sentence.
Thanks!