RSS

Agent Foundations

Tag

Why Agent Foun­da­tions? An Overly Ab­stract Explanation

johnswentworthMar 25, 2022, 11:17 PM
302 points
58 comments8 min readLW link1 review

Embed­ded Agency (full-text ver­sion)

Nov 15, 2018, 7:49 PM
201 points
17 comments54 min readLW link

The Rocket Align­ment Problem

Eliezer YudkowskyOct 4, 2018, 12:38 AM
227 points
44 comments15 min readLW link2 reviews

Un­der­stand­ing In­fra-Bayesi­anism: A Begin­ner-Friendly Video Series

Sep 22, 2022, 1:25 PM
140 points
6 comments2 min readLW link

Orthog­o­nal: A new agent foun­da­tions al­ign­ment organization

Tamsin LeakeApr 19, 2023, 8:17 PM
217 points
4 comments1 min readLW link
(orxl.org)

Align­ment has a Basin of At­trac­tion: Beyond the Orthog­o­nal­ity Thesis

RogerDearnaleyFeb 1, 2024, 9:15 PM
15 points
15 comments13 min readLW link

Strik­ing Im­pli­ca­tions for Learn­ing The­ory, In­ter­pretabil­ity — and Safety?

RogerDearnaleyJan 5, 2024, 8:46 AM
37 points
4 comments2 min readLW link

You won’t solve al­ign­ment with­out agent foundations

Mikhail SaminNov 6, 2022, 8:07 AM
27 points
3 comments8 min readLW link

Clar­ify­ing the Agent-Like Struc­ture Problem

johnswentworthSep 29, 2022, 9:28 PM
60 points
15 comments6 min readLW link

0th Per­son and 1st Per­son Logic

Adele LopezMar 10, 2024, 12:56 AM
60 points
28 comments6 min readLW link

Why Si­mu­la­tor AIs want to be Ac­tive In­fer­ence AIs

Apr 10, 2023, 6:23 PM
93 points
9 comments8 min readLW link1 review

Some Sum­maries of Agent Foun­da­tions Work

mattmacdermottMay 15, 2023, 4:09 PM
62 points
1 comment13 min readLW link

My take on agent foun­da­tions: for­mal­iz­ing metaphilo­soph­i­cal competence

zhukeepaApr 1, 2018, 6:33 AM
21 points
6 comments1 min readLW link

Time com­plex­ity for de­ter­minis­tic string machines

alcatalApr 21, 2024, 10:35 PM
21 points
2 comments21 min readLW link

for­mal­iz­ing the QACI al­ign­ment for­mal-goal

Jun 10, 2023, 3:28 AM
54 points
6 comments13 min readLW link
(carado.moe)

The Learn­ing-The­o­retic Agenda: Sta­tus 2023

Vanessa KosoyApr 19, 2023, 5:21 AM
143 points
21 comments55 min readLW link3 reviews

[Question] Cri­tiques of the Agent Foun­da­tions agenda?

JsevillamolNov 24, 2020, 4:11 PM
16 points
3 comments1 min readLW link

Em­piri­cal vs. Math­e­mat­i­cal Joints of Nature

Jun 26, 2024, 1:55 AM
35 points
1 comment5 min readLW link

For­mal­iz­ing the In­for­mal (event in­vite)

abramdemskiSep 10, 2024, 7:22 PM
42 points
0 comments1 min readLW link

Live The­ory Part 0: Tak­ing In­tel­li­gence Seriously

SahilJun 26, 2024, 9:37 PM
103 points
3 comments8 min readLW link

Work with me on agent foun­da­tions: in­de­pen­dent fellowship

Alex_AltairSep 21, 2024, 1:59 PM
59 points
5 comments4 min readLW link

Leav­ing MIRI, Seek­ing Funding

abramdemskiAug 8, 2024, 6:32 PM
264 points
19 comments2 min readLW link

Towards the Oper­a­tional­iza­tion of Philos­o­phy & Wisdom

Thane RuthenisOct 28, 2024, 7:45 PM
20 points
2 comments33 min readLW link
(aiimpacts.org)

Video lec­tures on the learn­ing-the­o­retic agenda

Vanessa KosoyOct 27, 2024, 12:01 PM
75 points
0 comments1 min readLW link
(www.youtube.com)

(A → B) → A

Scott GarrabrantSep 11, 2018, 10:38 PM
80 points
11 comments2 min readLW link

Refine­ment of Ac­tive In­fer­ence agency ontology

Roman LeventovDec 15, 2023, 9:31 AM
16 points
0 comments5 min readLW link
(arxiv.org)

Talk: “AI Would Be A Lot Less Alarm­ing If We Un­der­stood Agents”

johnswentworthDec 17, 2023, 11:46 PM
58 points
3 comments1 min readLW link
(www.youtube.com)

Mean­ing & Agency

abramdemskiDec 19, 2023, 10:27 PM
91 points
17 comments14 min readLW link

Ideal­ized Agents Are Ap­prox­i­mate Causal Mir­rors (+ Rad­i­cal Op­ti­mism on Agent Foun­da­tions)

Thane RuthenisDec 22, 2023, 8:19 PM
74 points
14 comments6 min readLW link

Come join Dove­tail’s agent foun­da­tions fel­low­ship talks & discussion

Alex_AltairFeb 15, 2025, 10:10 PM
24 points
0 comments1 min readLW link

The Plan − 2023 Version

johnswentworthDec 29, 2023, 11:34 PM
151 points
40 comments31 min readLW link1 review

A very non-tech­ni­cal ex­pla­na­tion of the ba­sics of in­fra-Bayesianism

David MatolcsiApr 26, 2023, 10:57 PM
62 points
9 comments9 min readLW link

Gaug­ing In­ter­est for a Learn­ing-The­o­retic Agenda Men­tor­ship Programme

Vanessa KosoyFeb 16, 2025, 4:24 PM
70 points
5 comments2 min readLW link

Uncer­tainty in all its flavours

Cleo NardoJan 9, 2024, 4:21 PM
27 points
6 comments35 min readLW link

In­ter­pret­ing Quan­tum Me­chan­ics in In­fra-Bayesian Physicalism

YegregFeb 12, 2024, 6:56 PM
30 points
6 comments43 min readLW link

Ab­stract Math­e­mat­i­cal Con­cepts vs. Ab­strac­tions Over Real-World Systems

Thane RuthenisFeb 18, 2025, 6:04 PM
32 points
10 comments4 min readLW link

Most Minds are Irrational

DavidmanheimDec 10, 2024, 9:36 AM
17 points
4 comments10 min readLW link

Co­her­ence of Caches and Agents

johnswentworthApr 1, 2024, 11:04 PM
77 points
9 comments11 min readLW link

[Question] Take over my pro­ject: do com­putable agents plan against the uni­ver­sal dis­tri­bu­tion pes­simisti­cally?

Cole WyethFeb 19, 2025, 8:17 PM
25 points
3 comments3 min readLW link

UDT1.01: Log­i­cal In­duc­tors and Im­plicit Beliefs (5/​10)

DiffractorApr 18, 2024, 8:39 AM
34 points
2 comments19 min readLW link

Towards a for­mal­iza­tion of the agent struc­ture problem

Alex_AltairApr 29, 2024, 8:28 PM
55 points
6 comments14 min readLW link

Lin­ear in­fra-Bayesian Bandits

Vanessa KosoyMay 10, 2024, 6:41 AM
39 points
5 comments1 min readLW link
(arxiv.org)

Hier­ar­chi­cal Agency: A Miss­ing Piece in AI Alignment

Jan_KulveitNov 27, 2024, 5:49 AM
112 points
20 comments11 min readLW link

Deep Learn­ing is cheap Solomonoff in­duc­tion?

Dec 7, 2024, 11:00 AM
44 points
1 comment17 min readLW link

Re­port & ret­ro­spec­tive on the Dove­tail fellowship

Alex_AltairMar 14, 2025, 11:20 PM
24 points
2 comments9 min readLW link

In­fra-Bayesian phys­i­cal­ism: a for­mal the­ory of nat­u­ral­ized induction

Vanessa KosoyNov 30, 2021, 10:25 PM
114 points
23 comments42 min readLW link1 review

An­nounce­ment: Learn­ing The­ory On­line Course

Jan 20, 2025, 7:55 PM
63 points
32 comments4 min readLW link

Agent Foun­da­tions 2025 at CMU

Jan 19, 2025, 11:48 PM
88 points
10 comments1 min readLW link

Con­se­quen­tial­ism is in the Stars not Ourselves

DragonGodApr 24, 2023, 12:02 AM
7 points
19 comments5 min readLW link

[Closed] Agent Foun­da­tions track in MATS

Vanessa KosoyOct 31, 2023, 8:12 AM
54 points
1 comment1 min readLW link
(www.matsprogram.org)

Box in­ver­sion revisited

Jan_KulveitNov 7, 2023, 11:09 AM
40 points
3 comments8 min readLW link

Learn­ing-the­o­retic agenda read­ing list

Vanessa KosoyNov 9, 2023, 5:25 PM
103 points
1 comment2 min readLW link1 review

Game The­ory with­out Argmax [Part 1]

Cleo NardoNov 11, 2023, 3:59 PM
70 points
18 comments19 min readLW link

Game The­ory with­out Argmax [Part 2]

Cleo NardoNov 11, 2023, 4:02 PM
31 points
14 comments13 min readLW link

Public Call for In­ter­est in Math­e­mat­i­cal Alignment

DavidmanheimNov 22, 2023, 1:22 PM
89 points
9 comments1 min readLW link

What’s next for the field of Agent Foun­da­tions?

Nov 30, 2023, 5:55 PM
59 points
23 comments10 min readLW link

Wild­fire of strategicness

TsviBTJun 5, 2023, 1:59 PM
38 points
19 comments1 min readLW link

My re­search agenda in agent foundations

Alex_AltairJun 28, 2023, 6:00 PM
72 points
9 comments11 min readLW link

AXRP Epi­sode 25 - Co­op­er­a­tive AI with Cas­par Oesterheld

DanielFilanOct 3, 2023, 9:50 PM
43 points
0 comments92 min readLW link

Challenges with Break­ing into MIRI-Style Research

Chris_LeongJan 17, 2022, 9:23 AM
75 points
16 comments3 min readLW link

Some AI re­search ar­eas and their rele­vance to ex­is­ten­tial safety

Andrew_CritchNov 19, 2020, 3:18 AM
205 points
37 comments50 min readLW link2 reviews

AXRP Epi­sode 15 - Nat­u­ral Ab­strac­tions with John Wentworth

DanielFilanMay 23, 2022, 5:40 AM
34 points
1 comment58 min readLW link

[Question] Does agent foun­da­tions cover all fu­ture ML sys­tems?

Jonas HallgrenJul 25, 2022, 1:17 AM
2 points
0 comments1 min readLW link

[Closed] Prize and fast track to al­ign­ment re­search at ALTER

Vanessa KosoySep 17, 2022, 4:58 PM
63 points
8 comments3 min readLW link

Con­tra “Strong Co­her­ence”

DragonGodMar 4, 2023, 8:05 PM
39 points
24 comments1 min readLW link

Com­po­si­tional lan­guage for hy­pothe­ses about computations

Vanessa KosoyMar 11, 2023, 7:43 PM
38 points
6 comments12 min readLW link

Fixed points in mor­tal pop­u­la­tion games

ViktoriaMalyasovaMar 14, 2023, 7:10 AM
31 points
0 comments12 min readLW link
(www.lesswrong.com)

In­ter­view with Vanessa Kosoy on the Value of The­o­ret­i­cal Re­search for AI

WillPetilloDec 4, 2023, 10:58 PM
37 points
0 comments35 min readLW link

Ra­tional Utopia & Nar­row Way There: Mul­tiver­sal AI Align­ment, Non-Agen­tic Static Place AI, New Ethics… (V. 4)

ankFeb 11, 2025, 3:21 AM
13 points
8 comments35 min readLW link

An Im­pos­si­bil­ity Proof Rele­vant to the Shut­down Prob­lem and Corrigibility

AudereMay 2, 2023, 6:52 AM
66 points
13 comments9 min readLW link

Towards Mea­sures of Optimisation

May 12, 2023, 3:29 PM
53 points
37 comments4 min readLW link

Unal­igned AGI & Brief His­tory of Inequality

ankFeb 22, 2025, 4:26 PM
−20 points
4 comments7 min readLW link

In­tent-al­igned AI sys­tems de­plete hu­man agency: the need for agency foun­da­tions re­search in AI safety

catubcMay 31, 2023, 9:18 PM
26 points
4 comments11 min readLW link

In­tel­li­gence–Agency Equiv­alence ≈ Mass–En­ergy Equiv­alence: On Static Na­ture of In­tel­li­gence & Phys­i­cal­iza­tion of Ethics

ankFeb 22, 2025, 12:12 AM
1 point
0 comments6 min readLW link

Open Prob­lems in AIXI Agent Foundations

Cole WyethSep 12, 2024, 3:38 PM
42 points
2 comments10 min readLW link

What pro­gram struc­tures en­able effi­cient in­duc­tion?

Daniel CSep 5, 2024, 10:12 AM
23 points
5 comments3 min readLW link

Nor­ma­tive vs De­scrip­tive Models of Agency

mattmacdermottFeb 2, 2023, 8:28 PM
26 points
5 comments4 min readLW link

Gear­ing Up for Long Timelines in a Hard World

DalcyJul 14, 2023, 6:11 AM
15 points
0 comments4 min readLW link

Can AI agents learn to be good?

Ram RachumAug 29, 2024, 2:20 PM
8 points
0 comments1 min readLW link
(futureoflife.org)

Ab­strac­tions are not Natural

Alfred HarwoodNov 4, 2024, 11:10 AM
25 points
21 comments11 min readLW link

A Straight­for­ward Ex­pla­na­tion of the Good Reg­u­la­tor Theorem

Alfred HarwoodNov 18, 2024, 12:45 PM
36 points
3 comments14 min readLW link

Op­ti­mi­sa­tion Mea­sures: Desider­ata, Im­pos­si­bil­ity, Proposals

Aug 7, 2023, 3:52 PM
36 points
9 comments1 min readLW link

Re­but­tals for ~all crit­i­cisms of AIXI

Cole WyethJan 7, 2025, 5:41 PM
20 points
15 comments14 min readLW link

A mostly crit­i­cal re­view of in­fra-Bayesianism

David MatolcsiFeb 28, 2023, 6:37 PM
104 points
9 comments29 min readLW link

Perfor­mance guaran­tees in clas­si­cal learn­ing the­ory and in­fra-Bayesianism

David MatolcsiFeb 28, 2023, 6:37 PM
9 points
4 comments31 min readLW link

Another take on agent foun­da­tions: for­mal­iz­ing zero-shot reasoning

zhukeepaJul 1, 2018, 6:12 AM
64 points
20 comments12 min readLW link

[Question] Pop­u­lar ma­te­ri­als about en­vi­ron­men­tal goals/​agent foun­da­tions? Peo­ple want­ing to dis­cuss such top­ics?

Q HomeJan 22, 2025, 3:30 AM
5 points
0 comments1 min readLW link

De­tect Good­hart and shut down

Jeremy GillenJan 22, 2025, 6:45 PM
68 points
21 comments7 min readLW link

Ar­gu­ments about Highly Reli­able Agent De­signs as a Use­ful Path to Ar­tifi­cial In­tel­li­gence Safety

Jan 27, 2022, 1:13 PM
27 points
0 comments1 min readLW link
(arxiv.org)

Towards build­ing blocks of ontologies

Feb 8, 2025, 4:03 PM
27 points
0 comments26 min readLW link

Rul­ing Out Lookup Tables

Alfred HarwoodFeb 4, 2025, 10:39 AM
20 points
11 comments7 min readLW link

An In­tro­duc­tion to Ev­i­den­tial De­ci­sion Theory

BabićFeb 2, 2025, 9:27 PM
5 points
2 comments10 min readLW link

Half-baked idea: a straight­for­ward method for learn­ing en­vi­ron­men­tal goals?

Q HomeFeb 4, 2025, 6:56 AM
16 points
7 comments5 min readLW link

Distill­ing the In­ter­nal Model Principle

JoseFaustinoFeb 8, 2025, 2:59 PM
21 points
0 comments16 min readLW link

100 Din­ners And A Work­shop: In­for­ma­tion Preser­va­tion And Goals

Stephen FowlerMar 28, 2023, 3:13 AM
8 points
0 comments7 min readLW link

Re­peated Play of Im­perfect New­comb’s Para­dox in In­fra-Bayesian Physicalism

Sven NilsenApr 3, 2023, 10:06 AM
2 points
0 comments2 min readLW link

Goal al­ign­ment with­out al­ign­ment on episte­mol­ogy, ethics, and sci­ence is futile

Roman LeventovApr 7, 2023, 8:22 AM
20 points
2 comments2 min readLW link

Re­quire­ments for a Basin of At­trac­tion to Alignment

RogerDearnaleyFeb 14, 2024, 7:10 AM
41 points
12 comments31 min readLW link

[Question] Choice := An­throp­ics un­cer­tainty? And po­ten­tial im­pli­ca­tions for agency

Antoine de ScorrailleApr 21, 2022, 4:38 PM
6 points
1 comment1 min readLW link

7. Evolu­tion and Ethics

RogerDearnaleyFeb 15, 2024, 11:38 PM
3 points
6 comments6 min readLW link

Un­der­stand­ing Selec­tion Theorems

adamkMay 28, 2022, 1:49 AM
41 points
3 comments7 min readLW link

Three Types of Con­straints in the Space of Agents

Jan 15, 2024, 5:27 PM
26 points
3 comments17 min readLW link

In­fra-Bayesi­anism nat­u­rally leads to the mono­ton­ic­ity prin­ci­ple, and I think this is a problem

David MatolcsiApr 26, 2023, 9:39 PM
19 points
6 comments4 min readLW link

Bridg­ing Ex­pected Utility Max­i­miza­tion and Optimization

Daniel HerrmannAug 5, 2022, 8:18 AM
25 points
5 comments14 min readLW link

Dis­cov­er­ing Agents

zac_kentonAug 18, 2022, 5:33 PM
73 points
11 comments6 min readLW link

A Gen­er­al­iza­tion of the Good Reg­u­la­tor Theorem

Alfred HarwoodJan 4, 2025, 9:55 AM
20 points
6 comments10 min readLW link

In­fra-Bayesian haggling

hannagaborMay 20, 2024, 12:23 PM
28 points
0 comments20 min readLW link

Re­ward is not Ne­c­es­sary: How to Create a Com­po­si­tional Self-Pre­serv­ing Agent for Life-Long Learning

Roman LeventovJan 12, 2023, 4:43 PM
17 points
2 comments2 min readLW link
(arxiv.org)
No comments.