Agent Foundations

Tag

Why Agent Foundations? An Overly Abstract Explanation

johnswentworthMar 25, 2022, 11:17 PM

309 points

58 comments8 min readLW link 1 review

Embedded Agency (full-text version)

Scott Garrabrant and abramdemski

Nov 15, 2018, 7:49 PM

209 points

17 comments54 min readLW link

The Rocket Alignment Problem

Eliezer YudkowskyOct 4, 2018, 12:38 AM

228 points

44 comments15 min readLW link 2 reviews

Some Summaries of Agent Foundations Work

mattmacdermottMay 15, 2023, 4:09 PM

62 points

1 comment13 min readLW link

Understanding Infra-Bayesianism: A Beginner-Friendly Video Series

Jack Parker and Connall Garrod

Sep 22, 2022, 1:25 PM

140 points

6 comments2 min readLW link

Orthogonal: A new agent foundations alignment organization

Tamsin LeakeApr 19, 2023, 8:17 PM

217 points

4 comments1 min readLW link

(orxl.org)

Striking Implications for Learning Theory, Interpretability — and Safety?

RogerDearnaleyJan 5, 2024, 8:46 AM

37 points

4 comments2 min readLW link

Alignment has a Basin of Attraction: Beyond the Orthogonality Thesis

RogerDearnaleyFeb 1, 2024, 9:15 PM

16 points

15 comments13 min readLW link

Working through a small tiling result

James PayorMay 13, 2025, 8:28 PM

66 points

9 comments5 min readLW link

You won’t solve alignment without agent foundations

Mikhail SaminNov 6, 2022, 8:07 AM

29 points

3 comments8 min readLW link

Why Simulator AIs want to be Active Inference AIs

Jan_Kulveit and rosehadshar

Apr 10, 2023, 6:23 PM

96 points

9 comments8 min readLW link 1 review

Clarifying the Agent-Like Structure Problem

johnswentworthSep 29, 2022, 9:28 PM

63 points

19 comments6 min readLW link

0th Person and 1st Person Logic

Adele LopezMar 10, 2024, 12:56 AM

60 points

28 comments6 min readLW link

My take on agent foundations: formalizing metaphilosophical competence

zhukeepaApr 1, 2018, 6:33 AM

21 points

6 comments1 min readLW link

Short Timelines Don’t Devalue Long Horizon Research

Vladimir_NesovApr 9, 2025, 12:42 AM

167 points

24 comments1 min readLW link

formalizing the QACI alignment formal-goal

Tamsin Leake and JuliaHP

Jun 10, 2023, 3:28 AM

54 points

6 comments13 min readLW link

(carado.moe)

The Learning-Theoretic Agenda: Status 2023

Vanessa KosoyApr 19, 2023, 5:21 AM

144 points

22 comments56 min readLW link 3 reviews

Non-Monotonic Infra-Bayesian Physicalism

Marcus OgrenApr 2, 2025, 12:14 PM

34 points

0 comments18 min readLW link

Time complexity for deterministic string machines

alcatalApr 21, 2024, 10:35 PM

21 points

2 comments21 min readLW link

[Question] Critiques of the Agent Foundations agenda?

JsevillamolNov 24, 2020, 4:11 PM

16 points

3 comments1 min readLW link

Fixed points in mortal population games

ViktoriaMalyasovaMar 14, 2023, 7:10 AM

31 points

0 comments12 min readLW link

(www.lesswrong.com)

Empirical vs. Mathematical Joints of Nature

Elizabeth and Alex_Altair

Jun 26, 2024, 1:55 AM

35 points

1 comment5 min readLW link

Wildfire of strategicness

TsviBTJun 5, 2023, 1:59 PM

38 points

19 comments1 min readLW link

Announcement: Learning Theory Online Course

Yegreg and Alex Flint

Jan 20, 2025, 7:55 PM

63 points

33 comments4 min readLW link

Live Theory Part 0: Taking Intelligence Seriously

SahilJun 26, 2024, 9:37 PM

101 points

3 comments8 min readLW link

Towards a formalization of the agent structure problem

Alex_AltairApr 29, 2024, 8:28 PM

55 points

6 comments14 min readLW link

Proceedings of ILIAD: Lessons and Progress

Alexander Gietelink Oldenziel and JessRiedel

Apr 28, 2025, 7:04 PM

77 points

5 comments8 min readLW link

Come join Dovetail’s agent foundations fellowship talks & discussion

Alex_AltairFeb 15, 2025, 10:10 PM

24 points

0 comments1 min readLW link

A very non-technical explanation of the basics of infra-Bayesianism

David MatolcsiApr 26, 2023, 10:57 PM

62 points

9 comments9 min readLW link

[Question] Does agent foundations cover all future ML systems?

Jonas HallgrenJul 25, 2022, 1:17 AM

4 points

0 comments1 min readLW link

Uncertainty in all its flavours

Cleo NardoJan 9, 2024, 4:21 PM

34 points

6 comments35 min readLW link

Is alignment reducible to becoming more coherent?

Cole WyethApr 22, 2025, 11:47 PM

19 points

0 comments3 min readLW link

Meaning & Agency

abramdemskiDec 19, 2023, 10:27 PM

91 points

17 comments14 min readLW link

[Question] Take over my project: do computable agents plan against the universal distribution pessimistically?

Cole WyethFeb 19, 2025, 8:17 PM

25 points

3 comments3 min readLW link

Video lectures on the learning-theoretic agenda

Vanessa KosoyOct 27, 2024, 12:01 PM

75 points

0 comments1 min readLW link

(www.youtube.com)

Abstract Mathematical Concepts vs. Abstractions Over Real-World Systems

Thane RuthenisFeb 18, 2025, 6:04 PM

32 points

10 comments4 min readLW link

Game Theory without Argmax [Part 2]

Cleo NardoNov 11, 2023, 4:02 PM

31 points

14 comments13 min readLW link

Idealized Agents Are Approximate Causal Mirrors (+ Radical Optimism on Agent Foundations)

Thane RuthenisDec 22, 2023, 8:19 PM

74 points

14 comments6 min readLW link

New Paper: Infra-Bayesian Decision-Estimation Theory

Vanessa Kosoy and Diffractor

Apr 10, 2025, 9:17 AM

77 points

4 comments1 min readLW link

(arxiv.org)

Infra-Bayesian physicalism: a formal theory of naturalized induction

Vanessa KosoyNov 30, 2021, 10:25 PM

114 points

23 comments42 min readLW link 1 review

Consequentialism is in the Stars not Ourselves

DragonGodApr 24, 2023, 12:02 AM

7 points

19 comments5 min readLW link

Report & retrospective on the Dovetail fellowship

Alex_AltairMar 14, 2025, 11:20 PM

26 points

3 comments9 min readLW link

Linear infra-Bayesian Bandits

Vanessa KosoyMay 10, 2024, 6:41 AM

40 points

5 comments1 min readLW link

(arxiv.org)

Interpreting Quantum Mechanics in Infra-Bayesian Physicalism

YegregFeb 12, 2024, 6:56 PM

30 points

6 comments43 min readLW link

[Closed] Gauging Interest for a Learning-Theoretic Agenda Mentorship Programme

Vanessa KosoyFeb 16, 2025, 4:24 PM

54 points

5 comments2 min readLW link

Formalizing the Informal (event invite)

abramdemskiSep 10, 2024, 7:22 PM

42 points

0 comments1 min readLW link

Talk: “AI Would Be A Lot Less Alarming If We Understood Agents”

johnswentworthDec 17, 2023, 11:46 PM

58 points

3 comments1 min readLW link

(www.youtube.com)

[Closed] Prize and fast track to alignment research at ALTER

Vanessa KosoySep 17, 2022, 4:58 PM

63 points

8 comments3 min readLW link

Glass box learners want to be black box

Cole WyethMay 10, 2025, 11:05 AM

46 points

10 comments4 min readLW link

Challenges with Breaking into MIRI-Style Research

Chris_LeongJan 17, 2022, 9:23 AM

75 points

16 comments2 min readLW link

Coherence of Caches and Agents

johnswentworthApr 1, 2024, 11:04 PM

78 points

9 comments11 min readLW link

Game Theory without Argmax [Part 1]

Cleo NardoNov 11, 2023, 3:59 PM

70 points

18 comments19 min readLW link

Some AI research areas and their relevance to existential safety

Andrew_CritchNov 19, 2020, 3:18 AM

205 points

37 comments50 min readLW link 2 reviews

Learning-theoretic agenda reading list

Vanessa KosoyNov 9, 2023, 5:25 PM

103 points

1 comment2 min readLW link 1 review

The Plan − 2023 Version

johnswentworthDec 29, 2023, 11:34 PM

152 points

40 comments31 min readLW link 1 review

(A → B) → A

Scott GarrabrantSep 11, 2018, 10:38 PM

80 points

11 comments2 min readLW link

Hierarchical Agency: A Missing Piece in AI Alignment

Jan_KulveitNov 27, 2024, 5:49 AM

112 points

21 comments11 min readLW link

Leaving MIRI, Seeking Funding

abramdemskiAug 8, 2024, 6:32 PM

264 points

19 comments2 min readLW link

Work with me on agent foundations: independent fellowship

Alex_AltairSep 21, 2024, 1:59 PM

59 points

5 comments4 min readLW link

What’s next for the field of Agent Foundations?

Nora_Ammann, Alexander Gietelink Oldenziel and mattmacdermott

Nov 30, 2023, 5:55 PM

59 points

23 comments10 min readLW link

Public Call for Interest in Mathematical Alignment

DavidmanheimNov 22, 2023, 1:22 PM

90 points

9 comments1 min readLW link

Towards the Operationalization of Philosophy & Wisdom

Thane RuthenisOct 28, 2024, 7:45 PM

20 points

2 comments33 min readLW link

(aiimpacts.org)

Contra “Strong Coherence”

DragonGodMar 4, 2023, 8:05 PM

39 points

24 comments1 min readLW link

Refinement of Active Inference agency ontology

Roman LeventovDec 15, 2023, 9:31 AM

16 points

0 comments5 min readLW link

(arxiv.org)

AXRP Episode 15 - Natural Abstractions with John Wentworth

DanielFilanMay 23, 2022, 5:40 AM

34 points

1 comment58 min readLW link

Box inversion revisited

Jan_KulveitNov 7, 2023, 11:09 AM

40 points

3 comments8 min readLW link

Agent Foundations 2025 at CMU

Alexander Gietelink Oldenziel and windows

Jan 19, 2025, 11:48 PM

90 points

10 comments1 min readLW link

My research agenda in agent foundations

Alex_AltairJun 28, 2023, 6:00 PM

75 points

9 comments11 min readLW link

Compositional language for hypotheses about computations

Vanessa KosoyMar 11, 2023, 7:43 PM

38 points

6 comments12 min readLW link

AXRP Episode 25 - Cooperative AI with Caspar Oesterheld

DanielFilanOct 3, 2023, 9:50 PM

43 points

0 comments92 min readLW link

[Closed] Agent Foundations track in MATS

Vanessa KosoyOct 31, 2023, 8:12 AM

54 points

1 comment1 min readLW link

(www.matsprogram.org)

Most Minds are Irrational

DavidmanheimDec 10, 2024, 9:36 AM

17 points

4 comments10 min readLW link

Deep Learning is cheap Solomonoff induction?

Lucius Bushnaq, Kaarel and Dmitry Vaintrob

Dec 7, 2024, 11:00 AM

45 points

1 comment17 min readLW link

UDT1.01: Logical Inductors and Implicit Beliefs (5/10)

DiffractorApr 18, 2024, 8:39 AM

34 points

2 comments19 min readLW link

What is Inadequate about Bayesianism for AI Alignment: Motivating Infra-Bayesianism

Brittany GelbMay 1, 2025, 7:06 PM

17 points

0 comments7 min readLW link

Ruling Out Lookup Tables

Alfred HarwoodFeb 4, 2025, 10:39 AM

22 points

11 comments7 min readLW link

Arguments about Highly Reliable Agent Designs as a Useful Path to Artificial Intelligence Safety

riceissa and Davidmanheim

Jan 27, 2022, 1:13 PM

27 points

0 comments1 min readLW link

(arxiv.org)

Proof Section to an Introduction to Reinforcement Learning for Understanding Infra-Bayesianism

Brittany GelbMay 17, 2025, 2:36 AM

3 points

0 comments9 min readLW link

Distilling the Internal Model Principle part II

JoseFaustinoApr 30, 2025, 5:56 PM

15 points

0 comments19 min readLW link

[Question] Popular materials about environmental goals/agent foundations? People wanting to discuss such topics?

Q HomeJan 22, 2025, 3:30 AM

5 points

0 comments1 min readLW link

Optimisation Measures: Desiderata, Impossibility, Proposals

mattmacdermott and Alexander Gietelink Oldenziel

Aug 7, 2023, 3:52 PM

36 points

9 comments1 min readLW link

A mostly critical review of infra-Bayesianism

David MatolcsiFeb 28, 2023, 6:37 PM

108 points

9 comments29 min readLW link

Towards Measures of Optimisation

mattmacdermott and Alexander Gietelink Oldenziel

May 12, 2023, 3:29 PM

53 points

37 comments4 min readLW link

Detect Goodhart and shut down

Jeremy GillenJan 22, 2025, 6:45 PM

70 points

21 comments7 min readLW link

An Introduction to Evidential Decision Theory

BabićFeb 2, 2025, 9:27 PM

5 points

2 comments10 min readLW link

Towards building blocks of ontologies

Daniel C, Alex_Altair, Dalcy, Alfred Harwood and JoseFaustino

Feb 8, 2025, 4:03 PM

29 points

0 comments26 min readLW link

Reward is not Necessary: How to Create a Compositional Self-Preserving Agent for Life-Long Learning

Roman LeventovJan 12, 2023, 4:43 PM

17 points

2 comments2 min readLW link

(arxiv.org)

Three Types of Constraints in the Space of Agents

Nora_Ammann and Mateusz Bagiński

Jan 15, 2024, 5:27 PM

26 points

3 comments17 min readLW link

Rational Effective Utopia & Narrow Way There: Multiversal AI Alignment, Place AI, New Ethicophysics… (Updated)

ankFeb 11, 2025, 3:21 AM

13 points

8 comments35 min readLW link

Gearing Up for Long Timelines in a Hard World

DalcyJul 14, 2023, 6:11 AM

18 points

0 comments4 min readLW link

Abstractions are not Natural

Alfred HarwoodNov 4, 2024, 11:10 AM

25 points

21 comments11 min readLW link

Intelligence–Agency Equivalence ≈ Mass–Energy Equivalence: On Static Nature of Intelligence & Physicalization of Ethics

ankFeb 22, 2025, 12:12 AM

1 point

0 comments6 min readLW link

Rebuttals for ~all criticisms of AIXI

Cole WyethJan 7, 2025, 5:41 PM

25 points

17 comments14 min readLW link

Understanding Selection Theorems

adamkMay 28, 2022, 1:49 AM

41 points

3 comments7 min readLW link

Performance guarantees in classical learning theory and infra-Bayesianism

David MatolcsiFeb 28, 2023, 6:37 PM

9 points

4 comments31 min readLW link

Distilling the Internal Model Principle

JoseFaustinoFeb 8, 2025, 2:59 PM

21 points

0 comments16 min readLW link

Can AI agents learn to be good?

Ram RachumAug 29, 2024, 2:20 PM

8 points

0 comments1 min readLW link

(futureoflife.org)

Infra-Bayesianism naturally leads to the monotonicity principle, and I think this is a problem

David MatolcsiApr 26, 2023, 9:39 PM

22 points

6 comments4 min readLW link

Goal alignment without alignment on epistemology, ethics, and science is futile

Roman LeventovApr 7, 2023, 8:22 AM

20 points

2 comments2 min readLW link

Bridging Expected Utility Maximization and Optimization

Daniel HerrmannAug 5, 2022, 8:18 AM

25 points

5 comments14 min readLW link

Requirements for a Basin of Attraction to Alignment

RogerDearnaleyFeb 14, 2024, 7:10 AM

41 points

12 comments31 min readLW link

Open Problems in AIXI Agent Foundations

Cole WyethSep 12, 2024, 3:38 PM

42 points

2 comments10 min readLW link

A Generalization of the Good Regulator Theorem

Alfred HarwoodJan 4, 2025, 9:55 AM

20 points

6 comments10 min readLW link

Infra-Bayesian haggling

hannagaborMay 20, 2024, 12:23 PM

28 points

0 comments20 min readLW link

Discovering Agents

zac_kentonAug 18, 2022, 5:33 PM

73 points

11 comments6 min readLW link

Unaligned AGI & Brief History of Inequality

ankFeb 22, 2025, 4:26 PM

−20 points

4 comments7 min readLW link

100 Dinners And A Workshop: Information Preservation And Goals

Stephen FowlerMar 28, 2023, 3:13 AM

8 points

0 comments7 min readLW link

Half-baked idea: a straightforward method for learning environmental goals?

Q HomeFeb 4, 2025, 6:56 AM

16 points

7 comments5 min readLW link

[Question] Choice := Anthropics uncertainty? And potential implications for agency

Antoine de ScorrailleApr 21, 2022, 4:38 PM

6 points

1 comment1 min readLW link

7. Evolution and Ethics

RogerDearnaleyFeb 15, 2024, 11:38 PM

3 points

7 comments6 min readLW link

Repeated Play of Imperfect Newcomb’s Paradox in Infra-Bayesian Physicalism

Sven NilsenApr 3, 2023, 10:06 AM

2 points

0 comments2 min readLW link

Interview with Vanessa Kosoy on the Value of Theoretical Research for AI

WillPetilloDec 4, 2023, 10:58 PM

37 points

0 comments35 min readLW link

Another take on agent foundations: formalizing zero-shot reasoning

zhukeepaJul 1, 2018, 6:12 AM

64 points

20 comments12 min readLW link

An Impossibility Proof Relevant to the Shutdown Problem and Corrigibility

AudereMay 2, 2023, 6:52 AM

66 points

13 comments9 min readLW link

An Introduction to Reinforcement Learning for Understanding Infra-Bayesianism

Brittany GelbMay 17, 2025, 2:34 AM

11 points

0 comments20 min readLW link

A Straightforward Explanation of the Good Regulator Theorem

Alfred HarwoodNov 18, 2024, 12:45 PM

75 points

18 comments14 min readLW link

Normative vs Descriptive Models of Agency

mattmacdermottFeb 2, 2023, 8:28 PM

26 points

5 comments4 min readLW link

What program structures enable efficient induction?

Daniel CSep 5, 2024, 10:12 AM

23 points

5 comments3 min readLW link

Intent-aligned AI systems deplete human agency: the need for agency foundations research in AI safety

catubcMay 31, 2023, 9:18 PM

26 points

4 comments11 min readLW link

Thou shalt not command an alighned AI

Martin VlachMay 11, 2025, 8:02 PM

0 points

4 comments1 min readLW link

No comments.

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer