RSS

Embed­ded Agency

TagLast edit: Jan 4, 2023, 2:57 AM by Daniel_Eth

Embedded Agency is the problem that an understanding of the theory of rational agents must account for the fact that the agents we create (and we ourselves) are inside the world or universe we are trying to affect, and not separated from it. This is in contrast with much current basic theory of AI or Rationality (such as Solomonoff induction or Bayesianism) which implicitly supposes a separation between the agent and the-things-the-agent-has-beliefs about. In other words, agents in this universe do not have Cartesian or dualistic boundaries like much of philosophy assumes, and are instead reductionist, that is agents are made up of non-agent parts like bits and atoms.

Embedded Agency is not a fully formalized research agenda, but Scott Garrabrant and Abram Demski have written the canonical explanation of the idea in their sequence Embedded Agency. This points to many of the core confusions we have about rational agency and attempts to tie them into a single picture.

Embed­ded Agency (full-text ver­sion)

Nov 15, 2018, 7:49 PM
201 points
17 comments54 min readLW link

Hu­mans Are Embed­ded Agents Too

johnswentworthDec 23, 2019, 7:21 PM
82 points
21 comments5 min readLW link

Embed­ded Agents

Oct 29, 2018, 7:53 PM
233 points
42 comments1 min readLW link2 reviews

In­tro­duc­tion to Carte­sian Frames

Scott GarrabrantOct 22, 2020, 1:00 PM
155 points
32 comments22 min readLW link1 review

Draft pa­pers for REALab and De­cou­pled Ap­proval on tampering

Oct 28, 2020, 4:01 PM
47 points
2 comments1 min readLW link

Sub­sys­tem Alignment

Nov 6, 2018, 4:16 PM
102 points
12 comments1 min readLW link

De­ci­sion Theory

Oct 31, 2018, 6:41 PM
121 points
45 comments1 min readLW link

Embed­ded World-Models

Nov 2, 2018, 4:07 PM
96 points
16 comments1 min readLW link

Embed­ded Agency via Abstraction

johnswentworthAug 26, 2019, 11:03 PM
42 points
20 comments11 min readLW link

Ro­bust Delegation

Nov 4, 2018, 4:38 PM
116 points
10 comments1 min readLW link

Up­dates and ad­di­tions to “Embed­ded Agency”

Aug 29, 2020, 4:22 AM
82 points
1 comment3 min readLW link

You Only Get One Shot: an In­tu­ition Pump for Embed­ded Agency

Oliver SourbutJun 9, 2022, 9:38 PM
24 points
4 comments2 min readLW link

Embed­ded Curiosities

Nov 8, 2018, 2:19 PM
91 points
1 comment2 min readLW link

MIRI/​OP ex­change about de­ci­sion theory

Rob BensingerAug 25, 2021, 10:44 PM
56 points
7 comments10 min readLW link

Embed­ded Agents are Quines

Dec 12, 2023, 4:57 AM
11 points
7 comments8 min readLW link

Mean­ing & Agency

abramdemskiDec 19, 2023, 10:27 PM
91 points
17 comments14 min readLW link

Uncer­tainty in all its flavours

Cleo NardoJan 9, 2024, 4:21 PM
27 points
6 comments35 min readLW link

All the Fol­low­ing are Distinct

Gianluca CalcagniAug 2, 2024, 4:35 PM
16 points
3 comments9 min readLW link

[Question] Are You More Real If You’re Really For­get­ful?

Thane RuthenisNov 24, 2024, 7:30 PM
39 points
25 comments5 min readLW link

Re­duc­ing LLM de­cep­tion at scale with self-other over­lap fine-tuning

Mar 13, 2025, 7:09 PM
150 points
40 comments5 min readLW link

Mis­tral Large 2 (123B) ex­hibits al­ign­ment faking

Mar 27, 2025, 3:39 PM
58 points
2 comments13 min readLW link

“em­bed­ded self-jus­tifi­ca­tion,” or some­thing like that

nostalgebraistNov 3, 2019, 3:20 AM
37 points
14 comments5 min readLW link
(nostalgebraist.tumblr.com)

(Dou­ble-)In­verse Embed­ded Agency Problem

ShmiJan 8, 2020, 4:30 AM
27 points
8 comments2 min readLW link

Embed­ded Agency: Not Just an AI Problem

johnswentworthJun 27, 2019, 12:35 AM
15 points
10 comments2 min readLW link

(A → B) → A

Scott GarrabrantSep 11, 2018, 10:38 PM
80 points
11 comments2 min readLW link

Bot­world: a cel­lu­lar au­toma­ton for study­ing self-mod­ify­ing agents em­bed­ded in their environment

So8resApr 12, 2014, 12:56 AM
80 points
54 comments7 min readLW link

When does ra­tio­nal­ity-as-search have non­triv­ial im­pli­ca­tions?

nostalgebraistNov 4, 2018, 10:42 PM
72 points
12 comments3 min readLW link

Log­i­cal Up­date­less­ness as a Ro­bust Del­e­ga­tion Problem

Scott GarrabrantOct 27, 2017, 9:16 PM
38 points
2 comments2 min readLW link

[Question] What are brains?

ValentineJun 10, 2023, 2:46 PM
10 points
22 comments2 min readLW link

The whirlpool of reality

Gordon Seidoh WorleySep 27, 2020, 2:36 AM
9 points
2 comments2 min readLW link

Ad­di­tive Oper­a­tions on Carte­sian Frames

Scott GarrabrantOct 26, 2020, 3:12 PM
62 points
6 comments11 min readLW link

Biex­ten­sional Equivalence

Scott GarrabrantOct 28, 2020, 2:07 PM
43 points
13 comments10 min readLW link

Con­trol­lables and Ob­serv­ables, Revisited

Scott GarrabrantOct 29, 2020, 4:38 PM
35 points
5 comments8 min readLW link

Func­tors and Coarse Worlds

Scott GarrabrantOct 30, 2020, 3:19 PM
52 points
3 comments8 min readLW link

Sub-Sums and Sub-Tensors

Scott GarrabrantNov 5, 2020, 6:06 PM
34 points
4 comments8 min readLW link

Mul­ti­plica­tive Oper­a­tions on Carte­sian Frames

Scott GarrabrantNov 3, 2020, 7:27 PM
34 points
24 comments12 min readLW link

Subagents of Carte­sian Frames

Scott GarrabrantNov 2, 2020, 10:02 PM
53 points
6 comments8 min readLW link

Carte­sian Frames Definitions

Rob BensingerNov 8, 2020, 12:44 PM
28 points
0 comments4 min readLW link

Com­mit­ting, As­sum­ing, Ex­ter­nal­iz­ing, and Internalizing

Scott GarrabrantNov 9, 2020, 4:59 PM
31 points
25 comments10 min readLW link

Eight Defi­ni­tions of Observability

Scott GarrabrantNov 10, 2020, 11:37 PM
34 points
26 comments12 min readLW link

Time in Carte­sian Frames

Scott GarrabrantNov 11, 2020, 8:25 PM
48 points
16 comments7 min readLW link

AXRP Epi­sode 9 - Finite Fac­tored Sets with Scott Garrabrant

DanielFilanJun 24, 2021, 10:10 PM
59 points
2 comments59 min readLW link

In­fra-Bayesi­anism Distil­la­tion: Real­iz­abil­ity and De­ci­sion Theory

Thomas LarsenMay 26, 2022, 9:57 PM
40 points
9 comments18 min readLW link

Gen­eral al­ign­ment properties

TurnTroutAug 8, 2022, 11:40 PM
50 points
2 comments1 min readLW link

Con­se­quen­tial­ists: One-Way Pat­tern Traps

David UdellJan 16, 2023, 8:48 PM
59 points
3 comments14 min readLW link

Some Sum­maries of Agent Foun­da­tions Work

mattmacdermottMay 15, 2023, 4:09 PM
62 points
1 comment13 min readLW link

What Pro­gram Are You?

RobinHansonOct 12, 2009, 12:29 AM
36 points
43 comments2 min readLW link

[Question] Define “Agent” (Embed­ded)

ApolloniaMar 24, 2024, 8:14 PM
10 points
1 comment1 min readLW link

Higher Di­men­sion Carte­sian Ob­jects and Align­ing ‘Tiling Si­mu­la­tors’

lukemarksJun 11, 2023, 12:13 AM
22 points
0 comments5 min readLW link

Time­less De­ci­sion The­ory and Meta-Cir­cu­lar De­ci­sion Theory

Eliezer YudkowskyAug 20, 2009, 10:07 PM
42 points
37 comments10 min readLW link

De­mys­tify­ing Born’s rule

Christopher KingJun 14, 2023, 3:16 AM
5 points
26 comments3 min readLW link

Minds: An Introduction

Rob BensingerMar 11, 2015, 7:00 PM
52 points
2 comments6 min readLW link

Are pre-speci­fied util­ity func­tions about the real world pos­si­ble in prin­ci­ple?

mloganJul 11, 2018, 6:46 PM
24 points
7 comments4 min readLW link

Static Place AI Makes Agen­tic AI Re­dun­dant: Mul­tiver­sal AI Align­ment & Ra­tional Utopia

ankFeb 13, 2025, 10:35 PM
1 point
2 comments11 min readLW link

Ex­plor­ing Mild Be­havi­our in Embed­ded Agents

Megan KinnimentJun 27, 2022, 6:56 PM
21 points
4 comments18 min readLW link

On Com­plex­ity Science

Garrett BakerApr 5, 2024, 2:24 AM
50 points
19 comments4 min readLW link

De­liber­a­tion, Re­ac­tions, and Con­trol: Ten­ta­tive Defi­ni­tions and a Res­tate­ment of In­stru­men­tal Convergence

Oliver SourbutJun 27, 2022, 5:25 PM
12 points
0 comments11 min readLW link

Perfor­mance guaran­tees in clas­si­cal learn­ing the­ory and in­fra-Bayesianism

David MatolcsiFeb 28, 2023, 6:37 PM
9 points
4 comments31 min readLW link

Strange Loops—Self-Refer­ence from Num­ber The­ory to AI

ojorgensenSep 28, 2022, 2:10 PM
17 points
6 comments18 min readLW link

Op­ti­miza­tion at a Distance

johnswentworthMay 16, 2022, 5:58 PM
88 points
16 comments4 min readLW link

LLMs may cap­ture key com­po­nents of hu­man agency

catubcNov 17, 2022, 8:14 PM
27 points
0 comments4 min readLW link

Riffing on the agent type

QuinnDec 8, 2022, 12:19 AM
21 points
3 comments4 min readLW link

Beyond Re­wards and Values: A Non-du­al­is­tic Ap­proach to Univer­sal Intelligence

Akira PyinyaDec 30, 2022, 7:05 PM
10 points
4 comments14 min readLW link

Ad­di­tive and Mul­ti­plica­tive Subagents

Scott GarrabrantNov 6, 2020, 2:26 PM
20 points
7 comments12 min readLW link

Causal rep­re­sen­ta­tion learn­ing as a tech­nique to pre­vent goal misgeneralization

PabloAMCJan 4, 2023, 12:07 AM
21 points
0 comments8 min readLW link

Ac­tion the­ory is not policy the­ory is not agent theory

Cole WyethSep 5, 2023, 1:38 AM
15 points
4 comments6 min readLW link
(colewyeth.com)

Could Roko’s basilisk acausally bar­gain with a pa­per­clip max­i­mizer?

Christopher KingMar 13, 2023, 6:21 PM
1 point
8 comments1 min readLW link

Troll Bridge

abramdemskiAug 23, 2019, 6:36 PM
86 points
59 comments12 min readLW link

Coun­ter­fac­tual Plan­ning in AGI Systems

Koen.HoltmanFeb 3, 2021, 1:54 PM
10 points
0 comments5 min readLW link

Phy­lac­tery De­ci­sion Theory

BunthutApr 2, 2021, 8:55 PM
14 points
6 comments2 min readLW link

Iden­ti­fi­a­bil­ity Prob­lem for Su­per­ra­tional De­ci­sion Theories

BunthutApr 9, 2021, 8:33 PM
17 points
16 comments2 min readLW link

Es­cap­ing the Löbian Obstacle

Morgan_RogersJun 16, 2021, 12:02 AM
14 points
10 comments7 min readLW link

Nor­ma­tive vs De­scrip­tive Models of Agency

mattmacdermottFeb 2, 2023, 8:28 PM
26 points
5 comments4 min readLW link

An­throp­ics and Embed­ded Agency

dadadarrenJun 26, 2021, 1:45 AM
7 points
2 comments2 min readLW link

Clar­ify­ing the free en­ergy prin­ci­ple (with quotes)

Ryo Oct 29, 2023, 4:03 PM
8 points
0 comments9 min readLW link

The Unavoid­able Ex­pe­rience of Free Will in a Deter­minis­tic World

gmaxNov 3, 2023, 5:55 PM
−10 points
0 comments2 min readLW link

ACI#6: A Non-Dual­is­tic ACI Model

Akira PyinyaNov 9, 2023, 11:01 PM
10 points
2 comments6 min readLW link

[Question] Would this be Progress in Solv­ing Embed­ded Agency?

Johannes C. MayerNov 14, 2023, 9:08 AM
9 points
2 comments2 min readLW link

[Question] Is there Work on Embed­ded Agency in Cel­lu­lar Au­tomata Toy Models?

Johannes C. MayerNov 14, 2023, 9:08 AM
10 points
0 comments1 min readLW link

Ap­ply to the Con­cep­tual Boundaries Work­shop for AI Safety

ChipmonkNov 27, 2023, 9:04 PM
50 points
0 comments3 min readLW link

Ra­tional Utopia & Nar­row Way There: Mul­tiver­sal AI Align­ment, Non-Agen­tic Static Place AI, New Ethics… (V. 4)

ankFeb 11, 2025, 3:21 AM
13 points
8 comments35 min readLW link

Sub­jec­tive Nat­u­ral­ism in De­ci­sion The­ory: Sav­age vs. Jeffrey–Bolker

Feb 4, 2025, 8:34 PM
45 points
22 comments5 min readLW link

Re­but­tals for ~all crit­i­cisms of AIXI

Cole WyethJan 7, 2025, 5:41 PM
20 points
15 comments14 min readLW link

Self-Other Over­lap: A Ne­glected Ap­proach to AI Alignment

Jul 30, 2024, 4:22 PM
215 points
49 comments12 min readLW link

Unal­igned AGI & Brief His­tory of Inequality

ankFeb 22, 2025, 4:26 PM
−20 points
4 comments7 min readLW link

[Question] Can sub­junc­tive de­pen­dence emerge from a sim­plic­ity prior?

Daniel CSep 16, 2024, 12:39 PM
11 points
0 comments1 min readLW link

Open Prob­lems in AIXI Agent Foundations

Cole WyethSep 12, 2024, 3:38 PM
42 points
2 comments10 min readLW link

Live The­ory Part 0: Tak­ing In­tel­li­gence Seriously

SahilJun 26, 2024, 9:37 PM
103 points
3 comments8 min readLW link

Op­ti­miza­tion Con­cepts in the Game of Life

Oct 16, 2021, 8:51 PM
75 points
16 comments10 min readLW link

A Pos­si­ble Re­s­olu­tion To Spu­ri­ous Counterfactuals

JoshuaOSHickmanDec 6, 2021, 6:26 PM
15 points
5 comments4 min readLW link

Ex­plor­ing De­ci­sion The­o­ries With Coun­ter­fac­tu­als and Dy­namic Agent Self-Pointers

JoshuaOSHickmanDec 18, 2021, 9:50 PM
2 points
0 comments4 min readLW link

Jonothan Go­rard:The ter­ri­tory is iso­mor­phic to an equiv­alence class of its maps

Daniel CSep 7, 2024, 10:04 AM
19 points
18 comments2 min readLW link
(x.com)

For­mal­iz­ing Two Prob­lems of Real­is­tic World Models

So8resJan 22, 2015, 11:12 PM
32 points
5 comments2 min readLW link

A Rephras­ing Of and Foot­note To An Embed­ded Agency Proposal

JoshuaOSHickmanMar 9, 2022, 6:13 PM
5 points
0 comments5 min readLW link

«Boundaries/​Mem­branes» and AI safety compilation

ChipmonkMay 3, 2023, 9:41 PM
56 points
17 comments8 min readLW link

[Question] Choice := An­throp­ics un­cer­tainty? And po­ten­tial im­pli­ca­tions for agency

Antoine de ScorrailleApr 21, 2022, 4:38 PM
6 points
1 comment1 min readLW link