Rohin Shah comments on Literature Review on Goal-Directedness

Rohin Shah 21 Jan 2021 21:58 UTC
LW: 8 AF: 6
AF
Planned summary for the Alignment Newsletter:
This literature review on goal-directedness identifies five different properties that should be true for a system to be described as goal-directed:
1. **Restricted space of goals:** The space of goals should not be too expansive, since otherwise goal-directedness can <@become vacuous@>(@Coherence arguments do not imply goal-directed behavior@) (e.g. if we allow arbitrary functions over world-histories with no additional assumptions).
2. **Explainability:** A system should be described as goal-directed when doing so improves our ability to _explain_ the system’s behavior and _predict_ what it will do.
3. **Generalization:** A goal-directed system should adapt its behavior in the face of changes to its environment, such that it continues to pursue its goal.
4. **Far-sighted:** A goal-directed system should consider the long-term consequences of its actions.
5. **Efficient:** The more goal-directed a system is, the more efficiently it should achieve its goal.
The concepts of goal-directedness, optimization, and agency seem to have significant overlap, but there are differences in the ways the terms are used. One common difference is that goal-directedness is often understood as a _behavioral_ property of agents, whereas optimization is thought of as a _mechanistic_ property about the agent’s internal cognition.
The authors then compare multiple proposals on these criteria:
1. The _intentional stance_ says that we should model a system as goal-directed when it helps us better explain the system’s behavior, performing well on explainability and generalization. It could easily be extended to include far-sightedness as well. A more efficient system for some goal will be easier to explain via the intentional stance, so it does well on that criterion too. And not every possible function can be a goal, since many are very complicated and thus would not be better explanations of behavior. However, the biggest issue is that the intentional stance cannot be easily formalized.
2. One possible formalization of the intentional stance is to say that a system is goal-directed when we can better explain the system’s behavior as maximizing a specific utility function, relative to explaining it using an input-output mapping (see <@Agents and Devices: A Relative Definition of Agency@>). This also does well on all five criteria.
3. <@AGI safety from first principles@> proposes another set of criteria that have a lot of overlap with the five criteria above.
4. A [definition based off of Kolmogorov complexity](https://www.alignmentforum.org/posts/dPmmuaz9szk26BkmD/vanessa-kosoy-s-shortform?commentId=Tg7A7rSYQSZPASm9s) works well, though it doesn’t require far-sightedness.
Planned opinion:
The five criteria seem pretty good to me as a description of what people mean when they say that a system is goal-directed. It is less clear to me that all five criteria are important for making the case for AI risk (which is why I care about a definition of goal-directedness); in particular it doesn’t seem to me like the explainability property is important for such an argument (see also [this comment](https://www.alignmentforum.org/posts/EnN7cm3KaRrEAuWfa/comment-on-coherence-arguments-do-not-imply-goal-directed?commentId=CsRXodmiBfZ9wCZwr)).
Note that it can still be the case that as a research strategy it is useful to search for definitions that satisfy these five criteria; it is just that in evaluating which definition to use I would choose the one that makes the AI risk argument work best. (See also [Against the Backward Approach to Goal-Directedness](https://www.alignmentforum.org/posts/adKSWktLbxfihDANM/against-the-backward-approach-to-goal-directedness).)
What links here?
- Towards a Mechanistic Understanding of Goal-Directedness by Mark Xu (9 Mar 2021 20:17 UTC; 46 points)
- adamShimi 26 Jan 2021 9:49 UTC
  LW: 4 AF: 3
  AF Parent
  Thanks for the inclusion in the newsletter and the opinion! (And sorry for taking so long to answer)
  This literature review on goal-directedness identifies five different properties that should be true for a system to be described as goal-directed:
  It’s implicit, but I think it should be made explicit that the properties/tests are what we extract from the literature, not what we say is fundamental. More specifically, we don’t say they should be true per se, we just extract and articulate them to “force” a discussion of them when defining goal-directedness.
  One common difference is that goal-directedness is often understood as a _behavioral_ property of agents, whereas optimization is thought of as a _mechanistic_ property about the agent’s internal cognition.
  I don’t think the post says that. We present both a behavioral (à la Dennett) view of goal-directedness as well as an internal property one (like what Evan or Richard discuss); same for the two forms of optimization considered.
  - Rohin Shah 26 Jan 2021 18:48 UTC
    LW: 8 AF: 6
    AF Parent
    It’s implicit, but I think it should be made explicit that the properties/tests are what we extract from the literature, not what we say is fundamental.
    Changed to “This post extracts five different concepts that have been identified in the literature as properties of goal-directed systems:”.
    I don’t think the post says that.
    Deleted that sentence.
    - adamShimi 26 Jan 2021 18:49 UTC
      LW: 2 AF: 1
      AF Parent
      Thanks!