Embedded Agency

TagLast edit: Jan 4, 2023, 2:57 AM by Daniel_Eth

Embedded Agency is the problem that an understanding of the theory of rational agents must account for the fact that the agents we create (and we ourselves) are inside the world or universe we are trying to affect, and not separated from it. This is in contrast with much current basic theory of AI or Rationality (such as Solomonoff induction or Bayesianism) which implicitly supposes a separation between the agent and the-things-the-agent-has-beliefs about. In other words, agents in this universe do not have Cartesian or dualistic boundaries like much of philosophy assumes, and are instead reductionist, that is agents are made up of non-agent parts like bits and atoms.

Embedded Agency is not a fully formalized research agenda, but Scott Garrabrant and Abram Demski have written the canonical explanation of the idea in their sequence Embedded Agency. This points to many of the core confusions we have about rational agency and attempts to tie them into a single picture.

Embedded Agency (full-text version)

Scott Garrabrant and abramdemski

Nov 15, 2018, 7:49 PM

209 points

17 comments54 min readLW link

Humans Are Embedded Agents Too

johnswentworthDec 23, 2019, 7:21 PM

82 points

21 comments5 min readLW link

Embedded Agents

abramdemski and Scott Garrabrant

Oct 29, 2018, 7:53 PM

234 points

42 comments1 min readLW link 2 reviews

Introduction to Cartesian Frames

Scott GarrabrantOct 22, 2020, 1:00 PM

155 points

32 comments22 min readLW link 1 review

Draft papers for REALab and Decoupled Approval on tampering

Jonathan Uesato and Ramana Kumar

Oct 28, 2020, 4:01 PM

47 points

2 comments1 min readLW link

Embedded World-Models

abramdemski and Scott Garrabrant

Nov 2, 2018, 4:07 PM

96 points

16 comments1 min readLW link

Robust Delegation

abramdemski and Scott Garrabrant

Nov 4, 2018, 4:38 PM

116 points

10 comments1 min readLW link

Embedded Agency via Abstraction

johnswentworthAug 26, 2019, 11:03 PM

42 points

20 comments11 min readLW link

Decision Theory

abramdemski and Scott Garrabrant

Oct 31, 2018, 6:41 PM

121 points

45 comments1 min readLW link

Subsystem Alignment

abramdemski and Scott Garrabrant

Nov 6, 2018, 4:16 PM

102 points

12 comments1 min readLW link

Updates and additions to “Embedded Agency”

Rob Bensinger and abramdemski

Aug 29, 2020, 4:22 AM

82 points

1 comment3 min readLW link

You Only Get One Shot: an Intuition Pump for Embedded Agency

Oliver SourbutJun 9, 2022, 9:38 PM

24 points

4 comments2 min readLW link

Embedded Curiosities

Scott Garrabrant and abramdemski

Nov 8, 2018, 2:19 PM

91 points

1 comment2 min readLW link

Eight Definitions of Observability

Scott GarrabrantNov 10, 2020, 11:37 PM

34 points

26 comments12 min readLW link

AXRP Episode 9 - Finite Factored Sets with Scott Garrabrant

DanielFilanJun 24, 2021, 10:10 PM

59 points

2 comments59 min readLW link

(A → B) → A

Scott GarrabrantSep 11, 2018, 10:38 PM

80 points

11 comments2 min readLW link

Reducing LLM deception at scale with self-other overlap fine-tuning

Marc Carauleanu, Diogo de Lucena, Gunnar_Zarncke, Judd Rosenblatt, Cameron Berg, Mike Vaiana and AE Studio

Mar 13, 2025, 7:09 PM

155 points

41 comments6 min readLW link

(Double-)Inverse Embedded Agency Problem

ShmiJan 8, 2020, 4:30 AM

27 points

8 comments2 min readLW link

Cartesian Frames Definitions

Rob BensingerNov 8, 2020, 12:44 PM

28 points

0 comments4 min readLW link

Committing, Assuming, Externalizing, and Internalizing

Scott GarrabrantNov 9, 2020, 4:59 PM

31 points

25 comments10 min readLW link

Embedded Agency: Not Just an AI Problem

johnswentworthJun 27, 2019, 12:35 AM

15 points

10 comments2 min readLW link

Logical Updatelessness as a Robust Delegation Problem

Scott GarrabrantOct 27, 2017, 9:16 PM

38 points

2 comments2 min readLW link

Non-Monotonic Infra-Bayesian Physicalism

Marcus OgrenApr 2, 2025, 12:14 PM

34 points

0 comments18 min readLW link

[Question] Are You More Real If You’re Really Forgetful?

Thane RuthenisNov 24, 2024, 7:30 PM

39 points

25 comments5 min readLW link

All the Following are Distinct

Gianluca CalcagniAug 2, 2024, 4:35 PM

16 points

3 comments9 min readLW link

Uncertainty in all its flavours

Cleo NardoJan 9, 2024, 4:21 PM

34 points

6 comments35 min readLW link

Additive Operations on Cartesian Frames

Scott GarrabrantOct 26, 2020, 3:12 PM

62 points

6 comments11 min readLW link

Meaning & Agency

abramdemskiDec 19, 2023, 10:27 PM

91 points

17 comments14 min readLW link

General alignment properties

TurnTroutAug 8, 2022, 11:40 PM

51 points

2 comments1 min readLW link

[Question] What are brains?

ValentineJun 10, 2023, 2:46 PM

10 points

22 comments2 min readLW link

Embedded Agents are Quines

lsusr and DaemonicSigil

Dec 12, 2023, 4:57 AM

11 points

7 comments8 min readLW link

Infra-Bayesian physicalism: a formal theory of naturalized induction

Vanessa KosoyNov 30, 2021, 10:25 PM

114 points

23 comments42 min readLW link 1 review

Controllables and Observables, Revisited

Scott GarrabrantOct 29, 2020, 4:38 PM

35 points

5 comments8 min readLW link

Functors and Coarse Worlds

Scott GarrabrantOct 30, 2020, 3:19 PM

52 points

3 comments8 min readLW link

Time in Cartesian Frames

Scott GarrabrantNov 11, 2020, 8:25 PM

48 points

16 comments7 min readLW link

When does rationality-as-search have nontrivial implications?

nostalgebraistNov 4, 2018, 10:42 PM

72 points

12 comments3 min readLW link

Botworld: a cellular automaton for studying self-modifying agents embedded in their environment

So8resApr 12, 2014, 12:56 AM

80 points

54 comments7 min readLW link

Sub-Sums and Sub-Tensors

Scott GarrabrantNov 5, 2020, 6:06 PM

34 points

4 comments8 min readLW link

Consequentialists: One-Way Pattern Traps

David UdellJan 16, 2023, 8:48 PM

59 points

3 comments14 min readLW link

Infra-Bayesianism Distillation: Realizability and Decision Theory

Thomas LarsenMay 26, 2022, 9:57 PM

40 points

9 comments18 min readLW link

“embedded self-justification,” or something like that

nostalgebraistNov 3, 2019, 3:20 AM

40 points

14 comments5 min readLW link

(nostalgebraist.tumblr.com)

MIRI/OP exchange about decision theory

Rob BensingerAug 25, 2021, 10:44 PM

56 points

7 comments10 min readLW link

The whirlpool of reality

Gordon Seidoh WorleySep 27, 2020, 2:36 AM

9 points

2 comments2 min readLW link

Mistral Large 2 (123B) seems to exhibit alignment faking

Marc Carauleanu, Diogo de Lucena, Gunnar_Zarncke, Cameron Berg, Judd Rosenblatt, Mike Vaiana and AE Studio

Mar 27, 2025, 3:39 PM

80 points

4 comments13 min readLW link

Biextensional Equivalence

Scott GarrabrantOct 28, 2020, 2:07 PM

43 points

13 comments10 min readLW link

Subagents of Cartesian Frames

Scott GarrabrantNov 2, 2020, 10:02 PM

53 points

6 comments8 min readLW link

Multiplicative Operations on Cartesian Frames

Scott GarrabrantNov 3, 2020, 7:27 PM

34 points

24 comments12 min readLW link

Higher Dimension Cartesian Objects and Aligning ‘Tiling Simulators’

lukemarksJun 11, 2023, 12:13 AM

22 points

0 comments5 min readLW link

[Question] Would this be Progress in Solving Embedded Agency?

Johannes C. MayerNov 14, 2023, 9:08 AM

9 points

2 comments2 min readLW link

What Program Are You?

RobinHansonOct 12, 2009, 12:29 AM

36 points

43 comments2 min readLW link

Performance guarantees in classical learning theory and infra-Bayesianism

David MatolcsiFeb 28, 2023, 6:37 PM

9 points

4 comments31 min readLW link

[Question] Define “Agent” (Embedded)

ApolloniaMar 24, 2024, 8:14 PM

10 points

1 comment1 min readLW link

Rational Effective Utopia & Narrow Way There: Multiversal AI Alignment, Place AI, New Ethicophysics… (Updated)

ankFeb 11, 2025, 3:21 AM

13 points

8 comments35 min readLW link

Causal representation learning as a technique to prevent goal misgeneralization

PabloAMCJan 4, 2023, 12:07 AM

21 points

0 comments8 min readLW link

Identifiability Problem for Superrational Decision Theories

BunthutApr 9, 2021, 8:33 PM

17 points

16 comments2 min readLW link

A Possible Resolution To Spurious Counterfactuals

JoshuaOSHickmanDec 6, 2021, 6:26 PM

15 points

5 comments4 min readLW link

Action theory is not policy theory is not agent theory

Cole WyethSep 5, 2023, 1:38 AM

20 points

4 comments6 min readLW link

(colewyeth.com)

Anthropics and Embedded Agency

dadadarrenJun 26, 2021, 1:45 AM

7 points

2 comments2 min readLW link

Open Problems in AIXI Agent Foundations

Cole WyethSep 12, 2024, 3:38 PM

42 points

2 comments10 min readLW link

Phylactery Decision Theory

BunthutApr 2, 2021, 8:55 PM

14 points

6 comments2 min readLW link

A Rephrasing Of and Footnote To An Embedded Agency Proposal

JoshuaOSHickmanMar 9, 2022, 6:13 PM

5 points

0 comments5 min readLW link

Jonothan Gorard:The territory is isomorphic to an equivalence class of its maps

Daniel CSep 7, 2024, 10:04 AM

19 points

18 comments2 min readLW link

(x.com)

Self-Other Overlap: A Neglected Approach to AI Alignment

Marc Carauleanu, Mike Vaiana, Judd Rosenblatt, Diogo de Lucena, Cameron Berg and AE Studio

Jul 30, 2024, 4:22 PM

223 points

51 comments12 min readLW link

The Unavoidable Experience of Free Will in a Deterministic World

gmaxNov 3, 2023, 5:55 PM

−12 points

0 comments3 min readLW link

Additive and Multiplicative Subagents

Scott GarrabrantNov 6, 2020, 2:26 PM

20 points

7 comments12 min readLW link

Exploring Decision Theories With Counterfactuals and Dynamic Agent Self-Pointers

JoshuaOSHickmanDec 18, 2021, 9:50 PM

2 points

0 comments4 min readLW link

[Question] Choice := Anthropics uncertainty? And potential implications for agency

Antoine de ScorrailleApr 21, 2022, 4:38 PM

6 points

1 comment1 min readLW link

LLMs may capture key components of human agency

catubcNov 17, 2022, 8:14 PM

27 points

0 comments4 min readLW link

The Way You Go Depends A Good Deal On Where You Want To Get: FEP minimizes surprise about actions using preferences about the future as evidence

Christopher KingApr 27, 2025, 9:55 PM

9 points

5 comments5 min readLW link

Rebuttals for ~all criticisms of AIXI

Cole WyethJan 7, 2025, 5:41 PM

25 points

17 comments14 min readLW link

Deliberation, Reactions, and Control: Tentative Definitions and a Restatement of Instrumental Convergence

Oliver SourbutJun 27, 2022, 5:25 PM

12 points

0 comments11 min readLW link

«Boundaries/Membranes» and AI safety compilation

Chris LakinMay 3, 2023, 9:41 PM

56 points

17 comments8 min readLW link

Troll Bridge

abramdemskiAug 23, 2019, 6:36 PM

86 points

59 comments12 min readLW link

Exploring Mild Behaviour in Embedded Agents

Megan KinnimentJun 27, 2022, 6:56 PM

21 points

4 comments18 min readLW link

Demystifying Born’s rule

Christopher KingJun 14, 2023, 3:16 AM

5 points

26 comments3 min readLW link

Optimization Concepts in the Game of Life

Vika and Ramana Kumar

Oct 16, 2021, 8:51 PM

75 points

16 comments10 min readLW link

Riffing on the agent type

QuinnDec 8, 2022, 12:19 AM

21 points

3 comments4 min readLW link

[Question] Can subjunctive dependence emerge from a simplicity prior?

Daniel CSep 16, 2024, 12:39 PM

11 points

0 comments1 min readLW link

Are pre-specified utility functions about the real world possible in principle?

mloganJul 11, 2018, 6:46 PM

24 points

7 comments4 min readLW link

Clarifying the free energy principle (with quotes)

Ryo Oct 29, 2023, 4:03 PM

8 points

0 comments9 min readLW link

Static Place AI Makes Agentic AI Redundant: Multiversal AI Alignment & Rational Utopia

ankFeb 13, 2025, 10:35 PM

1 point

2 comments11 min readLW link

Escaping the Löbian Obstacle

Morgan_RogersJun 16, 2021, 12:02 AM

14 points

10 comments7 min readLW link

Timeless Decision Theory and Meta-Circular Decision Theory

Eliezer YudkowskyAug 20, 2009, 10:07 PM

42 points

37 comments10 min readLW link

Live Theory Part 0: Taking Intelligence Seriously

SahilJun 26, 2024, 9:37 PM

101 points

3 comments8 min readLW link

Apply to the Conceptual Boundaries Workshop for AI Safety

Chris LakinNov 27, 2023, 9:04 PM

50 points

0 comments3 min readLW link

[Question] Is there Work on Embedded Agency in Cellular Automata Toy Models?

Johannes C. MayerNov 14, 2023, 9:08 AM

10 points

0 comments1 min readLW link

Des: A Case Study in Emergent Symbolic Continuity in GPT-4o

TallulahMerrallMay 19, 2025, 10:10 AM

1 point

0 comments5 min readLW link

On Complexity Science

Garrett BakerApr 5, 2024, 2:24 AM

51 points

19 comments4 min readLW link

Beyond Rewards and Values: A Non-dualistic Approach to Universal Intelligence

Akira PyinyaDec 30, 2022, 7:05 PM

10 points

4 comments14 min readLW link

Subjective Naturalism in Decision Theory: Savage vs. Jeffrey–Bolker

Daniel Herrmann, Aydin Mohseni and ben_levinstein

Feb 4, 2025, 8:34 PM

45 points

22 comments5 min readLW link

ACI#6: A Non-Dualistic ACI Model

Akira PyinyaNov 9, 2023, 11:01 PM

10 points

2 comments6 min readLW link

Strange Loops—Self-Reference from Number Theory to AI

ojorgensenSep 28, 2022, 2:10 PM

19 points

6 comments18 min readLW link

Normative vs Descriptive Models of Agency

mattmacdermottFeb 2, 2023, 8:28 PM

26 points

5 comments4 min readLW link

Some Summaries of Agent Foundations Work

mattmacdermottMay 15, 2023, 4:09 PM

62 points

1 comment13 min readLW link

Counterfactual Planning in AGI Systems

Koen.HoltmanFeb 3, 2021, 1:54 PM

10 points

0 comments5 min readLW link

Minds: An Introduction

Rob BensingerMar 11, 2015, 7:00 PM

53 points

2 comments6 min readLW link

Optimization at a Distance

johnswentworthMay 16, 2022, 5:58 PM

88 points

16 comments4 min readLW link

Unaligned AGI & Brief History of Inequality

ankFeb 22, 2025, 4:26 PM

−20 points

4 comments7 min readLW link

Formalizing Two Problems of Realistic World Models

So8resJan 22, 2015, 11:12 PM

32 points

5 comments2 min readLW link

Could Roko’s basilisk acausally bargain with a paperclip maximizer?

Christopher KingMar 13, 2023, 6:21 PM

1 point

8 comments1 min readLW link

jacquesthibs Dec 23, 2022, 8:17 PM
2 points
0
Embedded Agency is the problem that an understanding of the theory of rational agents must account for the fact that the agents we create (and we ourselves) are inside the world or universe we are trying to affect, and not separated from it. This is in contrast with much current basic theory of AI or Rationality (such as Solomonoff induction or Bayesianism) which implicitly supposes a separation between the agent and the-things-the-agent-has-beliefs about. In other words, agents in this universe do not have Cartesian or dualistic boundaries like much of philosophy thinks, and are instead reductionist, that is agents are made up of non-agent parts like bits and atoms.
Could someone clear up the following for me:
1. In which universe? The “current basic theory of AI” universe?
2. Could we add an intuitive explanation about what is meant by “Cartesian or dualistic boundaries”?
3. Not clear if “agents are made up of non-agent parts like bits and atoms” is for Embedded Agency or the old frame.
- mattmacdermott Feb 8, 2023, 7:21 PM
  2 points
  0
  Parent
  1. In our universe, as opposed to the “current basic theory of AI” universe.
  2. From Arbital:
  A Cartesian agent setup is one where the agent receives sensory information from the environment, and the agent sends motor outputs to the environment, and nothing else can cross the “Cartesian border” separating the agent and environment. If you can eat a psychedelic mushroom that affects the way you process the world—not just presenting you with sensory information, but altering the computations you do to think—then this is an example of an event that “violates the Cartesian boundary”. Likewise if the agent drops an anvil on its own head. Nothing that happens in a Cartesian universe can kill a Cartesian agent or modify its processing; all the universe can do is send the agent sensory information, in a particular format, that the agent reads.
  3. For embedded agency. In the old frame agents aren’t really made of anything.

Embed­ded Agency

Embedded Agency