Embedded Agency

TagLast edit: 4 Jan 2023 2:57 UTC by Daniel_Eth

Embedded Agency is the problem that an understanding of the theory of rational agents must account for the fact that the agents we create (and we ourselves) are inside the world or universe we are trying to affect, and not separated from it. This is in contrast with much current basic theory of AI or Rationality (such as Solomonoff induction or Bayesianism) which implicitly supposes a separation between the agent and the-things-the-agent-has-beliefs about. In other words, agents in this universe do not have Cartesian or dualistic boundaries like much of philosophy assumes, and are instead reductionist, that is agents are made up of non-agent parts like bits and atoms.

Embedded Agency is not a fully formalized research agenda, but Scott Garrabrant and Abram Demski have written the canonical explanation of the idea in their sequence Embedded Agency. This points to many of the core confusions we have about rational agency and attempts to tie them into a single picture.

Embedded Agency (full-text version)

Scott Garrabrant and abramdemski

15 Nov 2018 19:49 UTC

209 points

17 comments54 min readLW link

Humans Are Embedded Agents Too

johnswentworth23 Dec 2019 19:21 UTC

82 points

21 comments5 min readLW link

Embedded Agents

abramdemski and Scott Garrabrant

29 Oct 2018 19:53 UTC

234 points

42 comments1 min readLW link 2 reviews

Introduction to Cartesian Frames

Scott Garrabrant22 Oct 2020 13:00 UTC

155 points

32 comments22 min readLW link 1 review

Draft papers for REALab and Decoupled Approval on tampering

Jonathan Uesato and Ramana Kumar

28 Oct 2020 16:01 UTC

47 points

2 comments1 min readLW link

Embedded World-Models

abramdemski and Scott Garrabrant

2 Nov 2018 16:07 UTC

96 points

16 comments1 min readLW link

Robust Delegation

abramdemski and Scott Garrabrant

4 Nov 2018 16:38 UTC

116 points

10 comments1 min readLW link

Embedded Agency via Abstraction

johnswentworth26 Aug 2019 23:03 UTC

42 points

20 comments11 min readLW link

Decision Theory

abramdemski and Scott Garrabrant

31 Oct 2018 18:41 UTC

121 points

45 comments1 min readLW link

Subsystem Alignment

abramdemski and Scott Garrabrant

6 Nov 2018 16:16 UTC

102 points

12 comments1 min readLW link

Updates and additions to “Embedded Agency”

Rob Bensinger and abramdemski

29 Aug 2020 4:22 UTC

82 points

1 comment3 min readLW link

You Only Get One Shot: an Intuition Pump for Embedded Agency

Oliver Sourbut9 Jun 2022 21:38 UTC

24 points

4 comments2 min readLW link

Embedded Curiosities

Scott Garrabrant and abramdemski

8 Nov 2018 14:19 UTC

91 points

1 comment2 min readLW link

Eight Definitions of Observability

Scott Garrabrant10 Nov 2020 23:37 UTC

34 points

26 comments12 min readLW link

AXRP Episode 9 - Finite Factored Sets with Scott Garrabrant

DanielFilan24 Jun 2021 22:10 UTC

59 points

2 comments59 min readLW link

(A → B) → A

Scott Garrabrant11 Sep 2018 22:38 UTC

80 points

11 comments2 min readLW link

Reducing LLM deception at scale with self-other overlap fine-tuning

Marc Carauleanu, Diogo de Lucena, Gunnar_Zarncke, Judd Rosenblatt, Cameron Berg, Mike Vaiana and AE Studio

13 Mar 2025 19:09 UTC

155 points

41 comments6 min readLW link

(Double-)Inverse Embedded Agency Problem

Shmi8 Jan 2020 4:30 UTC

27 points

8 comments2 min readLW link

Cartesian Frames Definitions

Rob Bensinger8 Nov 2020 12:44 UTC

28 points

0 comments4 min readLW link

Committing, Assuming, Externalizing, and Internalizing

Scott Garrabrant9 Nov 2020 16:59 UTC

31 points

25 comments10 min readLW link

Embedded Agency: Not Just an AI Problem

johnswentworth27 Jun 2019 0:35 UTC

15 points

10 comments2 min readLW link

Logical Updatelessness as a Robust Delegation Problem

Scott Garrabrant27 Oct 2017 21:16 UTC

38 points

2 comments2 min readLW link

Non-Monotonic Infra-Bayesian Physicalism

Marcus Ogren2 Apr 2025 12:14 UTC

34 points

0 comments18 min readLW link

[Question] Are You More Real If You’re Really Forgetful?

Thane Ruthenis24 Nov 2024 19:30 UTC

39 points

25 comments5 min readLW link

All the Following are Distinct

Gianluca Calcagni2 Aug 2024 16:35 UTC

16 points

3 comments9 min readLW link

Uncertainty in all its flavours

Cleo Nardo9 Jan 2024 16:21 UTC

34 points

6 comments35 min readLW link

Additive Operations on Cartesian Frames

Scott Garrabrant26 Oct 2020 15:12 UTC

62 points

6 comments11 min readLW link

Meaning & Agency

abramdemski19 Dec 2023 22:27 UTC

93 points

17 comments14 min readLW link

General alignment properties

TurnTrout8 Aug 2022 23:40 UTC

51 points

2 comments1 min readLW link

[Question] What are brains?

Valentine10 Jun 2023 14:46 UTC

10 points

22 comments2 min readLW link

Embedded Agents are Quines

lsusr and DaemonicSigil

12 Dec 2023 4:57 UTC

11 points

7 comments8 min readLW link

Infra-Bayesian physicalism: a formal theory of naturalized induction

Vanessa Kosoy30 Nov 2021 22:25 UTC

114 points

23 comments42 min readLW link 1 review

Controllables and Observables, Revisited

Scott Garrabrant29 Oct 2020 16:38 UTC

35 points

5 comments8 min readLW link

Functors and Coarse Worlds

Scott Garrabrant30 Oct 2020 15:19 UTC

52 points

3 comments8 min readLW link

Time in Cartesian Frames

Scott Garrabrant11 Nov 2020 20:25 UTC

48 points

16 comments7 min readLW link

When does rationality-as-search have nontrivial implications?

nostalgebraist4 Nov 2018 22:42 UTC

72 points

12 comments3 min readLW link

Botworld: a cellular automaton for studying self-modifying agents embedded in their environment

So8res12 Apr 2014 0:56 UTC

80 points

54 comments7 min readLW link

Sub-Sums and Sub-Tensors

Scott Garrabrant5 Nov 2020 18:06 UTC

34 points

4 comments8 min readLW link

Consequentialists: One-Way Pattern Traps

David Udell16 Jan 2023 20:48 UTC

59 points

3 comments14 min readLW link

Infra-Bayesianism Distillation: Realizability and Decision Theory

Thomas Larsen26 May 2022 21:57 UTC

40 points

9 comments18 min readLW link

“embedded self-justification,” or something like that

nostalgebraist3 Nov 2019 3:20 UTC

40 points

14 comments5 min readLW link

(nostalgebraist.tumblr.com)

MIRI/OP exchange about decision theory

Rob Bensinger25 Aug 2021 22:44 UTC

56 points

7 comments10 min readLW link

The whirlpool of reality

Gordon Seidoh Worley27 Sep 2020 2:36 UTC

9 points

2 comments2 min readLW link

Mistral Large 2 (123B) seems to exhibit alignment faking

Marc Carauleanu, Diogo de Lucena, Gunnar_Zarncke, Cameron Berg, Judd Rosenblatt, Mike Vaiana and AE Studio

27 Mar 2025 15:39 UTC

80 points

4 comments13 min readLW link

Biextensional Equivalence

Scott Garrabrant28 Oct 2020 14:07 UTC

43 points

13 comments10 min readLW link

Subagents of Cartesian Frames

Scott Garrabrant2 Nov 2020 22:02 UTC

53 points

6 comments8 min readLW link

Multiplicative Operations on Cartesian Frames

Scott Garrabrant3 Nov 2020 19:27 UTC

34 points

24 comments12 min readLW link

Higher Dimension Cartesian Objects and Aligning ‘Tiling Simulators’

lukemarks11 Jun 2023 0:13 UTC

22 points

0 comments5 min readLW link

[Question] Would this be Progress in Solving Embedded Agency?

Johannes C. Mayer14 Nov 2023 9:08 UTC

9 points

2 comments2 min readLW link

What Program Are You?

RobinHanson12 Oct 2009 0:29 UTC

36 points

43 comments2 min readLW link

Performance guarantees in classical learning theory and infra-Bayesianism

David Matolcsi28 Feb 2023 18:37 UTC

9 points

4 comments31 min readLW link

[Question] Define “Agent” (Embedded)

Apollonia24 Mar 2024 20:14 UTC

10 points

1 comment1 min readLW link

Rational Effective Utopia & Narrow Way There: Multiversal AI Alignment, Place AI, New Ethicophysics… (Updated)

ank11 Feb 2025 3:21 UTC

13 points

8 comments35 min readLW link

Causal representation learning as a technique to prevent goal misgeneralization

PabloAMC4 Jan 2023 0:07 UTC

21 points

0 comments8 min readLW link

Identifiability Problem for Superrational Decision Theories

Bunthut9 Apr 2021 20:33 UTC

17 points

16 comments2 min readLW link

A Possible Resolution To Spurious Counterfactuals

JoshuaOSHickman6 Dec 2021 18:26 UTC

15 points

5 comments4 min readLW link

Action theory is not policy theory is not agent theory

Cole Wyeth5 Sep 2023 1:38 UTC

20 points

4 comments6 min readLW link

(colewyeth.com)

Anthropics and Embedded Agency

dadadarren26 Jun 2021 1:45 UTC

7 points

2 comments2 min readLW link

Open Problems in AIXI Agent Foundations

Cole Wyeth12 Sep 2024 15:38 UTC

42 points

2 comments10 min readLW link

Phylactery Decision Theory

Bunthut2 Apr 2021 20:55 UTC

14 points

6 comments2 min readLW link

A Rephrasing Of and Footnote To An Embedded Agency Proposal

JoshuaOSHickman9 Mar 2022 18:13 UTC

5 points

0 comments5 min readLW link

Jonothan Gorard:The territory is isomorphic to an equivalence class of its maps

Daniel C7 Sep 2024 10:04 UTC

19 points

18 comments2 min readLW link

(x.com)

Self-Other Overlap: A Neglected Approach to AI Alignment

Marc Carauleanu, Mike Vaiana, Judd Rosenblatt, Diogo de Lucena, Cameron Berg and AE Studio

30 Jul 2024 16:22 UTC

223 points

51 comments12 min readLW link

The Unavoidable Experience of Free Will in a Deterministic World

gmax3 Nov 2023 17:55 UTC

−12 points

0 comments3 min readLW link

Additive and Multiplicative Subagents

Scott Garrabrant6 Nov 2020 14:26 UTC

20 points

7 comments12 min readLW link

Exploring Decision Theories With Counterfactuals and Dynamic Agent Self-Pointers

JoshuaOSHickman18 Dec 2021 21:50 UTC

2 points

0 comments4 min readLW link

[Question] Choice := Anthropics uncertainty? And potential implications for agency

Antoine de Scorraille21 Apr 2022 16:38 UTC

6 points

1 comment1 min readLW link

LLMs may capture key components of human agency

catubc17 Nov 2022 20:14 UTC

27 points

0 comments4 min readLW link

The Way You Go Depends A Good Deal On Where You Want To Get: FEP minimizes surprise about actions using preferences about the future as evidence

Christopher King27 Apr 2025 21:55 UTC

9 points

5 comments5 min readLW link

Rebuttals for ~all criticisms of AIXI

Cole Wyeth7 Jan 2025 17:41 UTC

25 points

17 comments14 min readLW link

Deliberation, Reactions, and Control: Tentative Definitions and a Restatement of Instrumental Convergence

Oliver Sourbut27 Jun 2022 17:25 UTC

12 points

0 comments11 min readLW link

«Boundaries/Membranes» and AI safety compilation

Chris Lakin3 May 2023 21:41 UTC

56 points

17 comments8 min readLW link

Troll Bridge

abramdemski23 Aug 2019 18:36 UTC

86 points

59 comments12 min readLW link

Exploring Mild Behaviour in Embedded Agents

Megan Kinniment27 Jun 2022 18:56 UTC

21 points

4 comments18 min readLW link

Demystifying Born’s rule

Christopher King14 Jun 2023 3:16 UTC

5 points

26 comments3 min readLW link

Optimization Concepts in the Game of Life

Vika and Ramana Kumar

16 Oct 2021 20:51 UTC

75 points

16 comments10 min readLW link

Riffing on the agent type

Quinn8 Dec 2022 0:19 UTC

21 points

3 comments4 min readLW link

[Question] Can subjunctive dependence emerge from a simplicity prior?

Daniel C16 Sep 2024 12:39 UTC

11 points

0 comments1 min readLW link

Are pre-specified utility functions about the real world possible in principle?

mlogan11 Jul 2018 18:46 UTC

24 points

7 comments4 min readLW link

Clarifying the free energy principle (with quotes)

Ryo 29 Oct 2023 16:03 UTC

8 points

0 comments9 min readLW link

Static Place AI Makes Agentic AI Redundant: Multiversal AI Alignment & Rational Utopia

ank13 Feb 2025 22:35 UTC

1 point

2 comments11 min readLW link

Escaping the Löbian Obstacle

Morgan_Rogers16 Jun 2021 0:02 UTC

14 points

10 comments7 min readLW link

Timeless Decision Theory and Meta-Circular Decision Theory

Eliezer Yudkowsky20 Aug 2009 22:07 UTC

42 points

37 comments10 min readLW link

Live Theory Part 0: Taking Intelligence Seriously

Sahil26 Jun 2024 21:37 UTC

101 points

3 comments8 min readLW link

Apply to the Conceptual Boundaries Workshop for AI Safety

Chris Lakin27 Nov 2023 21:04 UTC

50 points

0 comments3 min readLW link

[Question] Is there Work on Embedded Agency in Cellular Automata Toy Models?

Johannes C. Mayer14 Nov 2023 9:08 UTC

10 points

0 comments1 min readLW link

Des: A Case Study in Emergent Symbolic Continuity in GPT-4o

TallulahMerrall19 May 2025 10:10 UTC

1 point

0 comments5 min readLW link

On Complexity Science

Garrett Baker5 Apr 2024 2:24 UTC

51 points

19 comments4 min readLW link

Beyond Rewards and Values: A Non-dualistic Approach to Universal Intelligence

Akira Pyinya30 Dec 2022 19:05 UTC

10 points

4 comments14 min readLW link

Subjective Naturalism in Decision Theory: Savage vs. Jeffrey–Bolker

Daniel Herrmann, Aydin Mohseni and ben_levinstein

4 Feb 2025 20:34 UTC

45 points

22 comments5 min readLW link

ACI#6: A Non-Dualistic ACI Model

Akira Pyinya9 Nov 2023 23:01 UTC

10 points

2 comments6 min readLW link

Strange Loops—Self-Reference from Number Theory to AI

ojorgensen28 Sep 2022 14:10 UTC

19 points

6 comments18 min readLW link

Normative vs Descriptive Models of Agency

mattmacdermott2 Feb 2023 20:28 UTC

26 points

5 comments4 min readLW link

Some Summaries of Agent Foundations Work

mattmacdermott15 May 2023 16:09 UTC

62 points

1 comment13 min readLW link

Counterfactual Planning in AGI Systems

Koen.Holtman3 Feb 2021 13:54 UTC

10 points

0 comments5 min readLW link

Minds: An Introduction

Rob Bensinger11 Mar 2015 19:00 UTC

53 points

2 comments6 min readLW link

Optimization at a Distance

johnswentworth16 May 2022 17:58 UTC

88 points

16 comments4 min readLW link

Unaligned AGI & Brief History of Inequality

ank22 Feb 2025 16:26 UTC

−20 points

4 comments7 min readLW link

Formalizing Two Problems of Realistic World Models

So8res22 Jan 2015 23:12 UTC

32 points

5 comments2 min readLW link

Could Roko’s basilisk acausally bargain with a paperclip maximizer?

Christopher King13 Mar 2023 18:21 UTC

1 point

8 comments1 min readLW link

jacquesthibs 23 Dec 2022 20:17 UTC
2 points
0
Embedded Agency is the problem that an understanding of the theory of rational agents must account for the fact that the agents we create (and we ourselves) are inside the world or universe we are trying to affect, and not separated from it. This is in contrast with much current basic theory of AI or Rationality (such as Solomonoff induction or Bayesianism) which implicitly supposes a separation between the agent and the-things-the-agent-has-beliefs about. In other words, agents in this universe do not have Cartesian or dualistic boundaries like much of philosophy thinks, and are instead reductionist, that is agents are made up of non-agent parts like bits and atoms.
Could someone clear up the following for me:
1. In which universe? The “current basic theory of AI” universe?
2. Could we add an intuitive explanation about what is meant by “Cartesian or dualistic boundaries”?
3. Not clear if “agents are made up of non-agent parts like bits and atoms” is for Embedded Agency or the old frame.
- mattmacdermott 8 Feb 2023 19:21 UTC
  2 points
  0
  Parent
  1. In our universe, as opposed to the “current basic theory of AI” universe.
  2. From Arbital:
  A Cartesian agent setup is one where the agent receives sensory information from the environment, and the agent sends motor outputs to the environment, and nothing else can cross the “Cartesian border” separating the agent and environment. If you can eat a psychedelic mushroom that affects the way you process the world—not just presenting you with sensory information, but altering the computations you do to think—then this is an example of an event that “violates the Cartesian boundary”. Likewise if the agent drops an anvil on its own head. Nothing that happens in a Cartesian universe can kill a Cartesian agent or modify its processing; all the universe can do is send the agent sensory information, in a particular format, that the agent reads.
  3. For embedded agency. In the old frame agents aren’t really made of anything.

Embed­ded Agency

Embedded Agency