RSS

Goal-Directedness

TagLast edit: Dec 30, 2024, 9:40 AM by Dakara

Goal-Directedness is the property of some system to be aiming at some goal. It is in need of formalization, but might prove important in deciding which kind of AI to try to align.

A goal may be defined as a world-state that an agent tries to achieve. Goal-directed agents may generate internal representations of desired end states, compare them against their internal representation of the current state of the world, and formulate plans for navigating from the latter to the former.

The goal-generating function may be derived from a pre-programmed lookup table (for simple worlds), from directly inverting the agent’s utility function (for simple utility functions), or it may be learned through experience mapping states to rewards and predicting which states will produce the largest rewards. The plan-generating algorithm could range from shortest-path algorithms like A* or Dijkstra’s algorithm (for fully-representable world graphs), to policy functions that learn through RL which actions bring the current state closer to the goal state (for simple AI), to some combination or extrapolation (for more advanced AI).

Implicit goal-directedness may come about in agents that do not have explicit internal representations of goals but that nevertheless learn or enact policies that cause the environment to converge on a certain state or set of states. Such implicit goal-directedness may arise, for instance, in simple reinforcement learning agents, which learn a policy function that maps states directly to actions.

Liter­a­ture Re­view on Goal-Directedness

Jan 18, 2021, 11:15 AM
80 points
21 comments31 min readLW link

Co­her­ence ar­gu­ments do not en­tail goal-di­rected behavior

Rohin ShahDec 3, 2018, 3:26 AM
134 points
69 comments7 min readLW link3 reviews

FAQ: What the heck is goal ag­nos­ti­cism?

porbyOct 8, 2023, 7:11 PM
66 points
38 comments28 min readLW link

Be­hav­ioral Suffi­cient Statis­tics for Goal-Directedness

adamShimiMar 11, 2021, 3:01 PM
21 points
12 comments9 min readLW link

AI safety with­out goal-di­rected behavior

Rohin ShahJan 7, 2019, 7:48 AM
68 points
15 comments4 min readLW link

De­liber­a­tion Every­where: Sim­ple Examples

Oliver SourbutJun 27, 2022, 5:26 PM
27 points
3 comments15 min readLW link

Goal-di­rected = Model-based RL?

adamShimiFeb 20, 2020, 7:13 PM
21 points
10 comments3 min readLW link

Fo­cus: you are al­lowed to be bad at ac­com­plish­ing your goals

adamShimiJun 3, 2020, 9:04 PM
19 points
17 comments3 min readLW link

Goal-Direct­ed­ness: What Suc­cess Looks Like

adamShimiAug 16, 2020, 6:33 PM
9 points
0 comments2 min readLW link

In­tu­itions about goal-di­rected behavior

Rohin ShahDec 1, 2018, 4:25 AM
54 points
15 comments6 min readLW link

Mea­sur­ing Co­her­ence of Poli­cies in Toy Environments

Mar 18, 2024, 5:59 PM
59 points
9 comments14 min readLW link

Will hu­mans build goal-di­rected agents?

Rohin ShahJan 5, 2019, 1:33 AM
61 points
43 comments5 min readLW link

Goal-di­rect­ed­ness is be­hav­ioral, not structural

adamShimiJun 8, 2020, 11:05 PM
6 points
12 comments3 min readLW link

Lo­cal­ity of goals

adamShimiJun 22, 2020, 9:56 PM
16 points
8 comments6 min readLW link

Goals and short descriptions

Michele CampoloJul 2, 2020, 5:41 PM
14 points
8 comments5 min readLW link

Goal-Direct­ed­ness and Be­hav­ior, Redux

adamShimiAug 9, 2021, 2:26 PM
16 points
4 comments2 min readLW link

Search­ing for Search

Nov 28, 2022, 3:31 PM
97 points
9 comments14 min readLW link1 review

P₂B: Plan to P₂B Better

Oct 24, 2021, 3:21 PM
38 points
17 comments6 min readLW link

Refine­ment of Ac­tive In­fer­ence agency ontology

Roman LeventovDec 15, 2023, 9:31 AM
16 points
0 comments5 min readLW link
(arxiv.org)

wrap­per-minds are the enemy

nostalgebraistJun 17, 2022, 1:58 AM
102 points
43 comments8 min readLW link

Goal-di­rect­ed­ness: tack­ling complexity

Morgan_RogersJul 2, 2022, 1:51 PM
8 points
0 comments38 min readLW link

Think care­fully be­fore call­ing RL poli­cies “agents”

TurnTroutJun 2, 2023, 3:46 AM
133 points
38 comments4 min readLW link1 review

Find­ing Goals in the World Model

Aug 22, 2022, 6:06 PM
59 points
8 comments13 min readLW link

In Defense of Wrap­per-Minds

Thane RuthenisDec 28, 2022, 6:28 PM
24 points
38 comments3 min readLW link

Evil au­to­com­plete: Ex­is­ten­tial Risk and Next-To­ken Predictors

YitzFeb 28, 2023, 8:47 AM
9 points
3 comments5 min readLW link

Su­per-Luigi = Luigi + (Luigi—Waluigi)

AlexeiMar 17, 2023, 3:27 PM
16 points
9 comments1 min readLW link

An Ap­peal to AI Su­per­in­tel­li­gence: Rea­sons to Pre­serve Humanity

James_MillerMar 18, 2023, 4:22 PM
41 points
73 comments12 min readLW link

A “Bit­ter Les­son” Ap­proach to Align­ing AGI and ASI

RogerDearnaleyJul 6, 2024, 1:23 AM
60 points
39 comments24 min readLW link

Lo­cally op­ti­mal psychology

ChipmonkNov 25, 2024, 6:35 PM
37 points
7 comments2 min readLW link
(twitter.com)

Against the Back­ward Ap­proach to Goal-Directedness

adamShimiJan 19, 2021, 6:46 PM
19 points
6 comments4 min readLW link

Towards a Mechanis­tic Un­der­stand­ing of Goal-Directedness

Mark XuMar 9, 2021, 8:17 PM
46 points
1 comment5 min readLW link

Creat­ing Com­plex Goals: A Model to Create Au­tonomous Agents

theravenMar 13, 2025, 6:17 PM
6 points
1 comment6 min readLW link

Quick thoughts on the im­pli­ca­tions of multi-agent views of mind on AI takeover

Kaj_SotalaDec 11, 2023, 6:34 AM
47 points
14 comments4 min readLW link

Value load­ing in the hu­man brain: a worked example

Steven ByrnesAug 4, 2021, 5:20 PM
45 points
2 comments8 min readLW link

When Most VNM-Co­her­ent Prefer­ence Order­ings Have Con­ver­gent In­stru­men­tal Incentives

TurnTroutAug 9, 2021, 5:22 PM
53 points
4 comments5 min readLW link

Ap­pli­ca­tions for De­con­fus­ing Goal-Directedness

adamShimiAug 8, 2021, 1:05 PM
38 points
3 comments5 min readLW link1 review

A re­view of “Agents and De­vices”

adamShimiAug 13, 2021, 8:42 AM
21 points
0 comments4 min readLW link

Op­ti­miza­tion Con­cepts in the Game of Life

Oct 16, 2021, 8:51 PM
75 points
16 comments10 min readLW link

Ca­pa­bil­ities and al­ign­ment of LLM cog­ni­tive architectures

Seth HerdApr 18, 2023, 4:29 PM
86 points
18 comments20 min readLW link

Goal-di­rect­ed­ness: my baseline beliefs

Morgan_RogersJan 8, 2022, 1:09 PM
21 points
3 comments3 min readLW link

Goal-di­rect­ed­ness: ex­plor­ing explanations

Morgan_RogersFeb 14, 2022, 4:20 PM
13 points
3 comments18 min readLW link

Goal-di­rect­ed­ness: im­perfect rea­son­ing, limited knowl­edge and in­ac­cu­rate beliefs

Morgan_RogersMar 19, 2022, 5:28 PM
4 points
1 comment21 min readLW link

[Question] why as­sume AGIs will op­ti­mize for fixed goals?

nostalgebraistJun 10, 2022, 1:28 AM
145 points
58 comments4 min readLW link2 reviews

“Clean” vs. “messy” goal-di­rect­ed­ness (Sec­tion 2.2.3 of “Schem­ing AIs”)

Joe CarlsmithNov 29, 2023, 4:32 PM
29 points
1 comment11 min readLW link

Em­piri­cal Ob­ser­va­tions of Ob­jec­tive Ro­bust­ness Failures

Jun 23, 2021, 11:23 PM
63 points
5 comments9 min readLW link

Un­der­stand­ing mesa-op­ti­miza­tion us­ing toy models

May 7, 2023, 5:00 PM
43 points
2 comments10 min readLW link

Mea­sur­ing Co­her­ence and Goal-Direct­ed­ness in RL Policies

dx26Apr 22, 2024, 6:26 PM
10 points
0 comments7 min readLW link

Emo­tional is­sues of­ten have an im­me­di­ate payoff

ChipmonkJun 10, 2024, 11:39 PM
26 points
2 comments4 min readLW link
(chrislakin.blog)

ParaS­copes: Do Lan­guage Models Plan the Up­com­ing Para­graph?

NickyPFeb 21, 2025, 4:50 PM
36 points
2 comments20 min readLW link

[In­terim re­search re­port] Eval­u­at­ing the Goal-Direct­ed­ness of Lan­guage Models

Jul 18, 2024, 6:19 PM
39 points
4 comments11 min readLW link

Don’t want Good­hart? — Spec­ify the damn variables

Yan LyutnevNov 21, 2024, 10:45 PM
−3 points
2 comments5 min readLW link

Don’t want Good­hart? — Spec­ify the vari­ables more

YanLyutnevNov 21, 2024, 10:43 PM
3 points
2 comments5 min readLW link

More ex­per­i­ments in GPT-4 agency: writ­ing memos

Christopher KingMar 24, 2023, 5:51 PM
5 points
2 comments10 min readLW link

Does GPT-4 ex­hibit agency when sum­ma­riz­ing ar­ti­cles?

Christopher KingMar 24, 2023, 3:49 PM
16 points
2 comments5 min readLW link

100 Din­ners And A Work­shop: In­for­ma­tion Preser­va­tion And Goals

Stephen FowlerMar 28, 2023, 3:13 AM
8 points
0 comments7 min readLW link

GPT-4 busted? Clear self-in­ter­est when sum­ma­riz­ing ar­ti­cles about it­self vs when ar­ti­cle talks about Claude, LLaMA, or DALL·E 2

Christopher KingMar 31, 2023, 5:05 PM
6 points
4 comments4 min readLW link

Imag­ine a world where Microsoft em­ploy­ees used Bing

Christopher KingMar 31, 2023, 6:36 PM
6 points
2 comments2 min readLW link

Agen­tized LLMs will change the al­ign­ment landscape

Seth HerdApr 9, 2023, 2:29 AM
160 points
102 comments3 min readLW link1 review

Towards an Ethics Calcu­la­tor for Use by an AGI

sweenesmDec 12, 2023, 6:37 PM
3 points
2 comments11 min readLW link

In­ves­ti­gat­ing Emer­gent Goal-Like Be­hav­ior in Large Lan­guage Models us­ing Ex­per­i­men­tal Economics

phelps-sgMay 5, 2023, 11:15 AM
6 points
1 comment4 min readLW link

GPT-4 im­plic­itly val­ues iden­tity preser­va­tion: a study of LMCA iden­tity management

OzyrusMay 17, 2023, 2:13 PM
21 points
4 comments13 min readLW link

Creat­ing a self-refer­en­tial sys­tem prompt for GPT-4

OzyrusMay 17, 2023, 2:13 PM
3 points
1 comment3 min readLW link

Su­per­in­tel­li­gence 15: Or­a­cles, ge­nies and sovereigns

KatjaGraceDec 23, 2014, 2:01 AM
11 points
30 comments7 min readLW link

[Question] Clar­ify­ing how mis­al­ign­ment can arise from scal­ing LLMs

UtilAug 19, 2023, 2:16 PM
3 points
1 comment1 min readLW link

Dis­cus­sion: Ob­jec­tive Ro­bust­ness and In­ner Align­ment Terminology

Jun 23, 2021, 11:25 PM
73 points
7 comments9 min readLW link

Grokking the In­ten­tional Stance

jbkjrAug 31, 2021, 3:49 PM
46 points
22 comments20 min readLW link1 review

[Question] Does Agent-like Be­hav­ior Im­ply Agent-like Ar­chi­tec­ture?

Scott GarrabrantAug 23, 2019, 2:01 AM
66 points
8 comments1 min readLW link

Fram­ing ap­proaches to al­ign­ment and the hard prob­lem of AI cognition

ryan_greenblattDec 15, 2021, 7:06 PM
16 points
15 comments27 min readLW link

Break­ing Down Goal-Directed Behaviour

Oliver SourbutJun 16, 2022, 6:45 PM
11 points
1 comment2 min readLW link

De­liber­a­tion, Re­ac­tions, and Con­trol: Ten­ta­tive Defi­ni­tions and a Res­tate­ment of In­stru­men­tal Convergence

Oliver SourbutJun 27, 2022, 5:25 PM
12 points
0 comments11 min readLW link

Con­ver­gence Towards World-Models: A Gears-Level Model

Thane RuthenisAug 4, 2022, 11:31 PM
38 points
1 comment13 min readLW link

How I think about alignment

Linda LinseforsAug 13, 2022, 10:01 AM
31 points
11 comments5 min readLW link

Dis­cov­er­ing Agents

zac_kentonAug 18, 2022, 5:33 PM
73 points
11 comments6 min readLW link

Goal-di­rect­ed­ness: rel­a­tivis­ing complexity

Morgan_RogersAug 18, 2022, 9:48 AM
3 points
0 comments11 min readLW link

Two senses of “op­ti­mizer”

Joar SkalseAug 21, 2019, 4:02 PM
35 points
41 comments3 min readLW link

Value For­ma­tion: An Over­ar­ch­ing Model

Thane RuthenisNov 15, 2022, 5:16 PM
34 points
20 comments34 min readLW link

The Align­ment Prob­lem from a Deep Learn­ing Per­spec­tive (ma­jor rewrite)

Jan 10, 2023, 4:06 PM
84 points
8 comments39 min readLW link
(arxiv.org)

How evolu­tion­ary lineages of LLMs can plan their own fu­ture and act on these plans

Roman LeventovDec 25, 2022, 6:11 PM
39 points
16 comments8 min readLW link
No comments.