RSS

Embed­ded Agency

TagLast edit: 4 Jan 2023 2:57 UTC by Daniel_Eth

Embedded Agency is the problem that an understanding of the theory of rational agents must account for the fact that the agents we create (and we ourselves) are inside the world or universe we are trying to affect, and not separated from it. This is in contrast with much current basic theory of AI or Rationality (such as Solomonoff induction or Bayesianism) which implicitly supposes a separation between the agent and the-things-the-agent-has-beliefs about. In other words, agents in this universe do not have Cartesian or dualistic boundaries like much of philosophy assumes, and are instead reductionist, that is agents are made up of non-agent parts like bits and atoms.

Embedded Agency is not a fully formalized research agenda, but Scott Garrabrant and Abram Demski have written the canonical explanation of the idea in their sequence Embedded Agency. This points to many of the core confusions we have about rational agency and attempts to tie them into a single picture.

Embed­ded Agency (full-text ver­sion)

15 Nov 2018 19:49 UTC
201 points
17 comments54 min readLW link

Embed­ded Agents

29 Oct 2018 19:53 UTC
228 points
41 comments1 min readLW link2 reviews

Hu­mans Are Embed­ded Agents Too

johnswentworth23 Dec 2019 19:21 UTC
81 points
19 comments5 min readLW link

In­tro­duc­tion to Carte­sian Frames

Scott Garrabrant22 Oct 2020 13:00 UTC
155 points
32 comments22 min readLW link1 review

Draft pa­pers for REALab and De­cou­pled Ap­proval on tampering

28 Oct 2020 16:01 UTC
47 points
2 comments1 min readLW link

Embed­ded World-Models

2 Nov 2018 16:07 UTC
96 points
16 comments1 min readLW link

Ro­bust Delegation

4 Nov 2018 16:38 UTC
116 points
10 comments1 min readLW link

Embed­ded Agency via Abstraction

johnswentworth26 Aug 2019 23:03 UTC
42 points
20 comments11 min readLW link

Sub­sys­tem Alignment

6 Nov 2018 16:16 UTC
99 points
12 comments1 min readLW link

De­ci­sion Theory

31 Oct 2018 18:41 UTC
120 points
45 comments1 min readLW link

Embed­ded Curiosities

8 Nov 2018 14:19 UTC
91 points
1 comment2 min readLW link

Up­dates and ad­di­tions to “Embed­ded Agency”

29 Aug 2020 4:22 UTC
82 points
1 comment3 min readLW link

You Only Get One Shot: an In­tu­ition Pump for Embed­ded Agency

Oliver Sourbut9 Jun 2022 21:38 UTC
24 points
4 comments2 min readLW link

MIRI/​OP ex­change about de­ci­sion theory

Rob Bensinger25 Aug 2021 22:44 UTC
55 points
7 comments10 min readLW link

Embed­ded Agency: Not Just an AI Problem

johnswentworth27 Jun 2019 0:35 UTC
15 points
10 comments2 min readLW link

(A → B) → A

Scott Garrabrant11 Sep 2018 22:38 UTC
78 points
11 comments2 min readLW link

Bot­world: a cel­lu­lar au­toma­ton for study­ing self-mod­ify­ing agents em­bed­ded in their environment

So8res12 Apr 2014 0:56 UTC
80 points
54 comments7 min readLW link

Mean­ing & Agency

abramdemski19 Dec 2023 22:27 UTC
91 points
17 comments14 min readLW link

When does ra­tio­nal­ity-as-search have non­triv­ial im­pli­ca­tions?

nostalgebraist4 Nov 2018 22:42 UTC
72 points
12 comments3 min readLW link

In­fra-Bayesi­anism Distil­la­tion: Real­iz­abil­ity and De­ci­sion Theory

Thomas Larsen26 May 2022 21:57 UTC
40 points
9 comments18 min readLW link

Log­i­cal Up­date­less­ness as a Ro­bust Del­e­ga­tion Problem

Scott Garrabrant27 Oct 2017 21:16 UTC
38 points
2 comments2 min readLW link

Uncer­tainty in all its flavours

Cleo Nardo9 Jan 2024 16:21 UTC
27 points
6 comments35 min readLW link

Gen­eral al­ign­ment properties

TurnTrout8 Aug 2022 23:40 UTC
50 points
2 comments1 min readLW link

[Question] What are brains?

Valentine10 Jun 2023 14:46 UTC
10 points
22 comments2 min readLW link

Embed­ded Agents are Quines

12 Dec 2023 4:57 UTC
11 points
7 comments8 min readLW link

All the Fol­low­ing are Distinct

Gianluca Calcagni2 Aug 2024 16:35 UTC
16 points
3 comments8 min readLW link

[Question] Are You More Real If You’re Really For­get­ful?

Thane Ruthenis24 Nov 2024 19:30 UTC
39 points
25 comments5 min readLW link

Biex­ten­sional Equivalence

Scott Garrabrant28 Oct 2020 14:07 UTC
43 points
13 comments10 min readLW link

The whirlpool of reality

Gordon Seidoh Worley27 Sep 2020 2:36 UTC
9 points
2 comments2 min readLW link

Ad­di­tive Oper­a­tions on Carte­sian Frames

Scott Garrabrant26 Oct 2020 15:12 UTC
62 points
6 comments11 min readLW link

Con­se­quen­tial­ists: One-Way Pat­tern Traps

David Udell16 Jan 2023 20:48 UTC
59 points
3 comments14 min readLW link

Con­trol­lables and Ob­serv­ables, Revisited

Scott Garrabrant29 Oct 2020 16:38 UTC
35 points
5 comments8 min readLW link

Func­tors and Coarse Worlds

Scott Garrabrant30 Oct 2020 15:19 UTC
52 points
3 comments8 min readLW link

Sub-Sums and Sub-Tensors

Scott Garrabrant5 Nov 2020 18:06 UTC
34 points
4 comments8 min readLW link

Mul­ti­plica­tive Oper­a­tions on Carte­sian Frames

Scott Garrabrant3 Nov 2020 19:27 UTC
34 points
24 comments12 min readLW link

Subagents of Carte­sian Frames

Scott Garrabrant2 Nov 2020 22:02 UTC
53 points
6 comments8 min readLW link

Carte­sian Frames Definitions

Rob Bensinger8 Nov 2020 12:44 UTC
28 points
0 comments4 min readLW link

Com­mit­ting, As­sum­ing, Ex­ter­nal­iz­ing, and Internalizing

Scott Garrabrant9 Nov 2020 16:59 UTC
31 points
25 comments10 min readLW link

Eight Defi­ni­tions of Observability

Scott Garrabrant10 Nov 2020 23:37 UTC
34 points
26 comments12 min readLW link

Time in Carte­sian Frames

Scott Garrabrant11 Nov 2020 20:25 UTC
48 points
16 comments7 min readLW link

“em­bed­ded self-jus­tifi­ca­tion,” or some­thing like that

nostalgebraist3 Nov 2019 3:20 UTC
37 points
14 comments5 min readLW link
(nostalgebraist.tumblr.com)

AXRP Epi­sode 9 - Finite Fac­tored Sets with Scott Garrabrant

DanielFilan24 Jun 2021 22:10 UTC
59 points
2 comments59 min readLW link

(Dou­ble-)In­verse Embed­ded Agency Problem

Shmi8 Jan 2020 4:30 UTC
27 points
8 comments2 min readLW link

Beyond Re­wards and Values: A Non-du­al­is­tic Ap­proach to Univer­sal Intelligence

Akira Pyinya30 Dec 2022 19:05 UTC
10 points
4 comments14 min readLW link

Causal rep­re­sen­ta­tion learn­ing as a tech­nique to pre­vent goal misgeneralization

PabloAMC4 Jan 2023 0:07 UTC
21 points
0 comments8 min readLW link

Nor­ma­tive vs De­scrip­tive Models of Agency

mattmacdermott2 Feb 2023 20:28 UTC
26 points
5 comments4 min readLW link

Perfor­mance guaran­tees in clas­si­cal learn­ing the­ory and in­fra-Bayesianism

David Matolcsi28 Feb 2023 18:37 UTC
9 points
4 comments31 min readLW link

Could Roko’s basilisk acausally bar­gain with a pa­per­clip max­i­mizer?

Christopher King13 Mar 2023 18:21 UTC
1 point
8 comments1 min readLW link

[Question] Define “Agent” (Embed­ded)

Apollonia24 Mar 2024 20:14 UTC
10 points
1 comment1 min readLW link

On Com­plex­ity Science

Garrett Baker5 Apr 2024 2:24 UTC
50 points
19 comments4 min readLW link

Jonothan Go­rard:The ter­ri­tory is iso­mor­phic to an equiv­alence class of its maps

Daniel C7 Sep 2024 10:04 UTC
17 points
18 comments2 min readLW link
(x.com)

Live The­ory Part 0: Tak­ing In­tel­li­gence Seriously

Sahil26 Jun 2024 21:37 UTC
101 points
3 comments8 min readLW link

Open Prob­lems in AIXI Agent Foundations

Cole Wyeth12 Sep 2024 15:38 UTC
41 points
2 comments10 min readLW link

[Question] Can sub­junc­tive de­pen­dence emerge from a sim­plic­ity prior?

Daniel C16 Sep 2024 12:39 UTC
6 points
0 comments1 min readLW link

Self-Other Over­lap: A Ne­glected Ap­proach to AI Alignment

30 Jul 2024 16:22 UTC
193 points
43 comments12 min readLW link

Clar­ify­ing the free en­ergy prin­ci­ple (with quotes)

Ryo 29 Oct 2023 16:03 UTC
8 points
0 comments9 min readLW link

The Unavoid­able Ex­pe­rience of Free Will in a Deter­minis­tic World

gmax3 Nov 2023 17:55 UTC
−10 points
0 comments2 min readLW link

ACI#6: A Non-Dual­is­tic ACI Model

Akira Pyinya9 Nov 2023 23:01 UTC
10 points
2 comments6 min readLW link

[Question] Would this be Progress in Solv­ing Embed­ded Agency?

Johannes C. Mayer14 Nov 2023 9:08 UTC
9 points
2 comments2 min readLW link

[Question] Is there Work on Embed­ded Agency in Cel­lu­lar Au­tomata Toy Models?

Johannes C. Mayer14 Nov 2023 9:08 UTC
10 points
0 comments1 min readLW link

Ap­ply to the Con­cep­tual Boundaries Work­shop for AI Safety

Chipmonk27 Nov 2023 21:04 UTC
50 points
0 comments3 min readLW link

«Boundaries/​Mem­branes» and AI safety compilation

Chipmonk3 May 2023 21:41 UTC
57 points
17 comments8 min readLW link

Some Sum­maries of Agent Foun­da­tions Work

mattmacdermott15 May 2023 16:09 UTC
62 points
1 comment13 min readLW link

What Pro­gram Are You?

RobinHanson12 Oct 2009 0:29 UTC
36 points
43 comments2 min readLW link

Higher Di­men­sion Carte­sian Ob­jects and Align­ing ‘Tiling Si­mu­la­tors’

lukemarks11 Jun 2023 0:13 UTC
22 points
0 comments5 min readLW link

Time­less De­ci­sion The­ory and Meta-Cir­cu­lar De­ci­sion Theory

Eliezer Yudkowsky20 Aug 2009 22:07 UTC
42 points
37 comments10 min readLW link

De­mys­tify­ing Born’s rule

Christopher King14 Jun 2023 3:16 UTC
5 points
26 comments3 min readLW link

Minds: An Introduction

Rob Bensinger11 Mar 2015 19:00 UTC
52 points
2 comments6 min readLW link

Are pre-speci­fied util­ity func­tions about the real world pos­si­ble in prin­ci­ple?

mlogan11 Jul 2018 18:46 UTC
24 points
7 comments4 min readLW link

Ad­di­tive and Mul­ti­plica­tive Subagents

Scott Garrabrant6 Nov 2020 14:26 UTC
20 points
7 comments12 min readLW link

De­ci­sion the­ory is not policy the­ory is not agent theory

Cole Wyeth5 Sep 2023 1:38 UTC
15 points
4 comments6 min readLW link
(colewyeth.com)

Troll Bridge

abramdemski23 Aug 2019 18:36 UTC
86 points
59 comments12 min readLW link

Coun­ter­fac­tual Plan­ning in AGI Systems

Koen.Holtman3 Feb 2021 13:54 UTC
10 points
0 comments5 min readLW link

Phy­lac­tery De­ci­sion Theory

Bunthut2 Apr 2021 20:55 UTC
14 points
6 comments2 min readLW link

Iden­ti­fi­a­bil­ity Prob­lem for Su­per­ra­tional De­ci­sion Theories

Bunthut9 Apr 2021 20:33 UTC
17 points
16 comments2 min readLW link

Es­cap­ing the Löbian Obstacle

Morgan_Rogers16 Jun 2021 0:02 UTC
14 points
10 comments7 min readLW link

An­throp­ics and Embed­ded Agency

dadadarren26 Jun 2021 1:45 UTC
7 points
2 comments2 min readLW link

Op­ti­miza­tion Con­cepts in the Game of Life

16 Oct 2021 20:51 UTC
75 points
16 comments10 min readLW link

A Pos­si­ble Re­s­olu­tion To Spu­ri­ous Counterfactuals

JoshuaOSHickman6 Dec 2021 18:26 UTC
15 points
5 comments4 min readLW link

Ex­plor­ing De­ci­sion The­o­ries With Coun­ter­fac­tu­als and Dy­namic Agent Self-Pointers

JoshuaOSHickman18 Dec 2021 21:50 UTC
2 points
0 comments4 min readLW link

For­mal­iz­ing Two Prob­lems of Real­is­tic World Models

So8res22 Jan 2015 23:12 UTC
32 points
5 comments2 min readLW link

A Rephras­ing Of and Foot­note To An Embed­ded Agency Proposal

JoshuaOSHickman9 Mar 2022 18:13 UTC
5 points
0 comments5 min readLW link

[Question] Choice := An­throp­ics un­cer­tainty? And po­ten­tial im­pli­ca­tions for agency

Antoine de Scorraille21 Apr 2022 16:38 UTC
6 points
1 comment1 min readLW link

Ex­plor­ing Mild Be­havi­our in Embed­ded Agents

Megan Kinniment27 Jun 2022 18:56 UTC
21 points
4 comments18 min readLW link

De­liber­a­tion, Re­ac­tions, and Con­trol: Ten­ta­tive Defi­ni­tions and a Res­tate­ment of In­stru­men­tal Convergence

Oliver Sourbut27 Jun 2022 17:25 UTC
12 points
0 comments11 min readLW link

Strange Loops—Self-Refer­ence from Num­ber The­ory to AI

ojorgensen28 Sep 2022 14:10 UTC
15 points
6 comments18 min readLW link

Op­ti­miza­tion at a Distance

johnswentworth16 May 2022 17:58 UTC
88 points
16 comments4 min readLW link

LLMs may cap­ture key com­po­nents of hu­man agency

catubc17 Nov 2022 20:14 UTC
27 points
0 comments4 min readLW link

Riffing on the agent type

Quinn8 Dec 2022 0:19 UTC
21 points
3 comments4 min readLW link