Goal-Directedness

TagLast edit: Dec 30, 2024, 9:40 AM by Dakara

Goal-Directedness is the property of some system to be aiming at some goal. It is in need of formalization, but might prove important in deciding which kind of AI to try to align.

A goal may be defined as a world-state that an agent tries to achieve. Goal-directed agents may generate internal representations of desired end states, compare them against their internal representation of the current state of the world, and formulate plans for navigating from the latter to the former.

The goal-generating function may be derived from a pre-programmed lookup table (for simple worlds), from directly inverting the agent’s utility function (for simple utility functions), or it may be learned through experience mapping states to rewards and predicting which states will produce the largest rewards. The plan-generating algorithm could range from shortest-path algorithms like A* or Dijkstra’s algorithm (for fully-representable world graphs), to policy functions that learn through RL which actions bring the current state closer to the goal state (for simple AI), to some combination or extrapolation (for more advanced AI).

Implicit goal-directedness may come about in agents that do not have explicit internal representations of goals but that nevertheless learn or enact policies that cause the environment to converge on a certain state or set of states. Such implicit goal-directedness may arise, for instance, in simple reinforcement learning agents, which learn a policy function $π : S \to A$ that maps states directly to actions.

Literature Review on Goal-Directedness

adamShimi, Michele Campolo and Joe Collman

Jan 18, 2021, 11:15 AM

80 points

21 comments31 min readLW link

Coherence arguments do not entail goal-directed behavior

Rohin ShahDec 3, 2018, 3:26 AM

134 points

69 comments7 min readLW link 3 reviews

FAQ: What the heck is goal agnosticism?

porbyOct 8, 2023, 7:11 PM

66 points

38 comments28 min readLW link

Behavioral Sufficient Statistics for Goal-Directedness

adamShimiMar 11, 2021, 3:01 PM

21 points

12 comments9 min readLW link

AI safety without goal-directed behavior

Rohin ShahJan 7, 2019, 7:48 AM

68 points

15 comments4 min readLW link

Deliberation Everywhere: Simple Examples

Oliver SourbutJun 27, 2022, 5:26 PM

27 points

3 comments15 min readLW link

Goal-directed = Model-based RL?

adamShimiFeb 20, 2020, 7:13 PM

21 points

10 comments3 min readLW link

Focus: you are allowed to be bad at accomplishing your goals

adamShimiJun 3, 2020, 9:04 PM

19 points

17 comments3 min readLW link

Goal-Directedness: What Success Looks Like

adamShimiAug 16, 2020, 6:33 PM

9 points

0 comments2 min readLW link

Intuitions about goal-directed behavior

Rohin ShahDec 1, 2018, 4:25 AM

54 points

15 comments6 min readLW link

Measuring Coherence of Policies in Toy Environments

dx26 and Richard_Ngo

Mar 18, 2024, 5:59 PM

59 points

9 comments14 min readLW link

Will humans build goal-directed agents?

Rohin ShahJan 5, 2019, 1:33 AM

61 points

43 comments5 min readLW link

Goal-directedness is behavioral, not structural

adamShimiJun 8, 2020, 11:05 PM

6 points

12 comments3 min readLW link

Locality of goals

adamShimiJun 22, 2020, 9:56 PM

16 points

8 comments6 min readLW link

Goals and short descriptions

Michele CampoloJul 2, 2020, 5:41 PM

14 points

8 comments5 min readLW link

Goal-Directedness and Behavior, Redux

adamShimiAug 9, 2021, 2:26 PM

16 points

4 comments2 min readLW link

Searching for Search

NicholasKees and janus

Nov 28, 2022, 3:31 PM

97 points

9 comments14 min readLW link 1 review

P₂B: Plan to P₂B Better

Ramana Kumar and Daniel Kokotajlo

Oct 24, 2021, 3:21 PM

38 points

17 comments6 min readLW link

Refinement of Active Inference agency ontology

Roman LeventovDec 15, 2023, 9:31 AM

16 points

0 comments5 min readLW link

(arxiv.org)

wrapper-minds are the enemy

nostalgebraistJun 17, 2022, 1:58 AM

104 points

43 comments8 min readLW link

Goal-directedness: tackling complexity

Morgan_RogersJul 2, 2022, 1:51 PM

8 points

0 comments38 min readLW link

Think carefully before calling RL policies “agents”

TurnTroutJun 2, 2023, 3:46 AM

133 points

38 comments4 min readLW link 1 review

Finding Goals in the World Model

Jeremy Gillen, JamesH and Thomas Larsen

Aug 22, 2022, 6:06 PM

59 points

8 comments13 min readLW link

In Defense of Wrapper-Minds

Thane RuthenisDec 28, 2022, 6:28 PM

24 points

38 comments3 min readLW link

Evil autocomplete: Existential Risk and Next-Token Predictors

YitzFeb 28, 2023, 8:47 AM

9 points

3 comments5 min readLW link

Super-Luigi = Luigi + (Luigi—Waluigi)

AlexeiMar 17, 2023, 3:27 PM

16 points

9 comments1 min readLW link

An Appeal to AI Superintelligence: Reasons to Preserve Humanity

James_MillerMar 18, 2023, 4:22 PM

41 points

73 comments12 min readLW link

A “Bitter Lesson” Approach to Aligning AGI and ASI

RogerDearnaleyJul 6, 2024, 1:23 AM

60 points

39 comments24 min readLW link

Locally optimal psychology

ChipmonkNov 25, 2024, 6:35 PM

37 points

7 comments2 min readLW link

(twitter.com)

Against the Backward Approach to Goal-Directedness

adamShimiJan 19, 2021, 6:46 PM

19 points

6 comments4 min readLW link

Towards a Mechanistic Understanding of Goal-Directedness

Mark XuMar 9, 2021, 8:17 PM

46 points

1 comment5 min readLW link

Creating Complex Goals: A Model to Create Autonomous Agents

theravenMar 13, 2025, 6:17 PM

6 points

1 comment6 min readLW link

Quick thoughts on the implications of multi-agent views of mind on AI takeover

Kaj_SotalaDec 11, 2023, 6:34 AM

47 points

14 comments4 min readLW link

Value loading in the human brain: a worked example

Steven ByrnesAug 4, 2021, 5:20 PM

45 points

2 comments8 min readLW link

When Most VNM-Coherent Preference Orderings Have Convergent Instrumental Incentives

TurnTroutAug 9, 2021, 5:22 PM

53 points

4 comments5 min readLW link

Applications for Deconfusing Goal-Directedness

adamShimiAug 8, 2021, 1:05 PM

38 points

3 comments5 min readLW link 1 review

A review of “Agents and Devices”

adamShimiAug 13, 2021, 8:42 AM

21 points

0 comments4 min readLW link

Optimization Concepts in the Game of Life

Vika and Ramana Kumar

Oct 16, 2021, 8:51 PM

75 points

16 comments10 min readLW link

Capabilities and alignment of LLM cognitive architectures

Seth HerdApr 18, 2023, 4:29 PM

88 points

18 comments20 min readLW link

Goal-directedness: my baseline beliefs

Morgan_RogersJan 8, 2022, 1:09 PM

21 points

3 comments3 min readLW link

Goal-directedness: exploring explanations

Morgan_RogersFeb 14, 2022, 4:20 PM

13 points

3 comments18 min readLW link

Goal-directedness: imperfect reasoning, limited knowledge and inaccurate beliefs

Morgan_RogersMar 19, 2022, 5:28 PM

4 points

1 comment21 min readLW link

[Question] why assume AGIs will optimize for fixed goals?

nostalgebraistJun 10, 2022, 1:28 AM

147 points

60 comments4 min readLW link 2 reviews

“Clean” vs. “messy” goal-directedness (Section 2.2.3 of “Scheming AIs”)

Joe CarlsmithNov 29, 2023, 4:32 PM

29 points

1 comment11 min readLW link

Empirical Observations of Objective Robustness Failures

jbkjr and Lauro Langosco

Jun 23, 2021, 11:23 PM

63 points

5 comments9 min readLW link

Understanding mesa-optimization using toy models

tilmanr, rusheb, Guillaume Corlouer, Dan Valentine, afspies, mivanitskiy and Can

May 7, 2023, 5:00 PM

43 points

2 comments10 min readLW link

Measuring Coherence and Goal-Directedness in RL Policies

dx26Apr 22, 2024, 6:26 PM

10 points

0 comments7 min readLW link

Emotional issues often have an immediate payoff

ChipmonkJun 10, 2024, 11:39 PM

26 points

2 comments4 min readLW link

(chrislakin.blog)

ParaScopes: Do Language Models Plan the Upcoming Paragraph?

NickyPFeb 21, 2025, 4:50 PM

36 points

2 comments20 min readLW link

[Interim research report] Evaluating the Goal-Directedness of Language Models

Rauno Arike, Elizabeth Donoway and Marius Hobbhahn

Jul 18, 2024, 6:19 PM

40 points

4 comments11 min readLW link

Don’t want Goodhart? — Specify the damn variables

Yan LyutnevNov 21, 2024, 10:45 PM

−3 points

2 comments5 min readLW link

Don’t want Goodhart? — Specify the variables more

YanLyutnevNov 21, 2024, 10:43 PM

2 points

2 comments5 min readLW link

More experiments in GPT-4 agency: writing memos

Christopher KingMar 24, 2023, 5:51 PM

5 points

2 comments10 min readLW link

Does GPT-4 exhibit agency when summarizing articles?

Christopher KingMar 24, 2023, 3:49 PM

16 points

2 comments5 min readLW link

100 Dinners And A Workshop: Information Preservation And Goals

Stephen FowlerMar 28, 2023, 3:13 AM

8 points

0 comments7 min readLW link

GPT-4 busted? Clear self-interest when summarizing articles about itself vs when article talks about Claude, LLaMA, or DALL·E 2

Christopher KingMar 31, 2023, 5:05 PM

6 points

4 comments4 min readLW link

Imagine a world where Microsoft employees used Bing

Christopher KingMar 31, 2023, 6:36 PM

6 points

2 comments2 min readLW link

Agentized LLMs will change the alignment landscape

Seth HerdApr 9, 2023, 2:29 AM

160 points

102 comments3 min readLW link 1 review

Towards an Ethics Calculator for Use by an AGI

sweenesmDec 12, 2023, 6:37 PM

3 points

2 comments11 min readLW link

Investigating Emergent Goal-Like Behavior in Large Language Models using Experimental Economics

phelps-sgMay 5, 2023, 11:15 AM

6 points

1 comment4 min readLW link

GPT-4 implicitly values identity preservation: a study of LMCA identity management

OzyrusMay 17, 2023, 2:13 PM

21 points

4 comments13 min readLW link

Creating a self-referential system prompt for GPT-4

OzyrusMay 17, 2023, 2:13 PM

3 points

1 comment3 min readLW link

Superintelligence 15: Oracles, genies and sovereigns

KatjaGraceDec 23, 2014, 2:01 AM

11 points

30 comments7 min readLW link

[Question] Clarifying how misalignment can arise from scaling LLMs

UtilAug 19, 2023, 2:16 PM

3 points

1 comment1 min readLW link

Discussion: Objective Robustness and Inner Alignment Terminology

jbkjr and Lauro Langosco

Jun 23, 2021, 11:25 PM

73 points

7 comments9 min readLW link

Grokking the Intentional Stance

jbkjrAug 31, 2021, 3:49 PM

46 points

22 comments20 min readLW link 1 review

[Question] Does Agent-like Behavior Imply Agent-like Architecture?

Scott GarrabrantAug 23, 2019, 2:01 AM

69 points

8 comments1 min readLW link

Framing approaches to alignment and the hard problem of AI cognition

ryan_greenblattDec 15, 2021, 7:06 PM

16 points

15 comments27 min readLW link

Breaking Down Goal-Directed Behaviour

Oliver SourbutJun 16, 2022, 6:45 PM

11 points

1 comment2 min readLW link

Deliberation, Reactions, and Control: Tentative Definitions and a Restatement of Instrumental Convergence

Oliver SourbutJun 27, 2022, 5:25 PM

12 points

0 comments11 min readLW link

Convergence Towards World-Models: A Gears-Level Model

Thane RuthenisAug 4, 2022, 11:31 PM

38 points

1 comment13 min readLW link

How I think about alignment

Linda LinseforsAug 13, 2022, 10:01 AM

31 points

11 comments5 min readLW link

Discovering Agents

zac_kentonAug 18, 2022, 5:33 PM

73 points

11 comments6 min readLW link

Goal-directedness: relativising complexity

Morgan_RogersAug 18, 2022, 9:48 AM

3 points

0 comments11 min readLW link

Two senses of “optimizer”

Joar SkalseAug 21, 2019, 4:02 PM

35 points

41 comments3 min readLW link

Value Formation: An Overarching Model

Thane RuthenisNov 15, 2022, 5:16 PM

34 points

20 comments34 min readLW link

The Alignment Problem from a Deep Learning Perspective (major rewrite)

SoerenMind, Richard_Ngo and LawrenceC

Jan 10, 2023, 4:06 PM

84 points

8 comments39 min readLW link

(arxiv.org)

How evolutionary lineages of LLMs can plan their own future and act on these plans

Roman LeventovDec 25, 2022, 6:11 PM

39 points

16 comments8 min readLW link

No comments.