Embedded Agents
(A longer text-based version of this post is also available on MIRI’s blog here, and the bibliography for the whole sequence can be found here)
- Introduction to Cartesian Frames by 22 Oct 2020 13:00 UTC; 155 points) (
- The Plan − 2023 Version by 29 Dec 2023 23:34 UTC; 146 points) (
- 2018 Review: Voting Results! by 24 Jan 2020 2:00 UTC; 135 points) (
- 2018 Review: Voting Results! by 24 Jan 2020 2:00 UTC; 135 points) (
- Forecasting Thread: AI Timelines by 22 Aug 2020 2:33 UTC; 133 points) (
- Philosophy in the Darkest Timeline: Basics of the Evolution of Meaning by 7 Jun 2020 7:52 UTC; 131 points) (
- A Shutdown Problem Proposal by 21 Jan 2024 18:12 UTC; 125 points) (
- Selection Theorems: A Program For Understanding Agents by 28 Sep 2021 5:03 UTC; 123 points) (
- Welcome & FAQ! by 24 Aug 2021 20:14 UTC; 114 points) (
- Humans Are Embedded Agents Too by 23 Dec 2019 19:21 UTC; 81 points) (
- Prizes for Last Year’s 2018 Review by 2 Dec 2020 11:21 UTC; 72 points) (
- Agents Over Cartesian World Models by 27 Apr 2021 2:06 UTC; 66 points) (
- Against Time in Agent Models by 13 May 2022 19:55 UTC; 62 points) (
- «Boundaries/Membranes» and AI safety compilation by 3 May 2023 21:41 UTC; 57 points) (
- What Decision Theory is Implied By Predictive Processing? by 28 Sep 2020 17:20 UTC; 56 points) (
- [AN #127]: Rethinking agency: Cartesian frames as a formalization of ways to carve up the world into an agent and its environment by 2 Dec 2020 18:20 UTC; 53 points) (
- Sunday October 25, 12:00PM (PT) — Scott Garrabrant on “Cartesian Frames” by 21 Oct 2020 3:27 UTC; 48 points) (
- Gödel’s Legacy: A game without end by 28 Jun 2020 18:50 UTC; 42 points) (
- Embedded Agency via Abstraction by 26 Aug 2019 23:03 UTC; 42 points) (
- [AN #163]: Using finite factored sets for causal and temporal inference by 8 Sep 2021 17:20 UTC; 41 points) (
- Understanding Selection Theorems by 28 May 2022 1:49 UTC; 41 points) (
- confusion about alignment requirements by 6 Oct 2022 10:32 UTC; 39 points) (
- Simulators, constraints, and goal agnosticism: porbynotes vol. 1 by 23 Nov 2022 4:22 UTC; 37 points) (
- If brains are computers, what kind of computers are they? (Dennett transcript) by 30 Jan 2020 5:07 UTC; 37 points) (
- What are the most plausible “AI Safety warning shot” scenarios? by 26 Mar 2020 20:59 UTC; 35 points) (
- What’s your big idea? by 18 Oct 2019 15:47 UTC; 30 points) (
- International Relations; States, Rational Actors, and Other Approaches (Policy and International Relations Primer Part 4) by 22 Jan 2020 8:29 UTC; 27 points) (EA Forum;
- how has this forum changed your life? by 30 Jan 2020 21:54 UTC; 26 points) (
- What is abstraction? by 15 Dec 2018 8:36 UTC; 25 points) (
- [AN #148]: Analyzing generalization across more axes than just accuracy or loss by 28 Apr 2021 18:30 UTC; 24 points) (
- [AN #105]: The economic trajectory of humanity, and what we might mean by optimization by 24 Jun 2020 17:30 UTC; 24 points) (
- Clarifying Factored Cognition by 13 Dec 2020 20:02 UTC; 23 points) (
- My decomposition of the alignment problem by 2 Sep 2024 0:21 UTC; 20 points) (
- Theory of Ideal Agents, or of Existing Agents? by 13 Sep 2019 17:38 UTC; 20 points) (
- Sunday July 12 — talks by Scott Garrabrant, Alexflint, alexei, Stuart_Armstrong by 8 Jul 2020 0:27 UTC; 19 points) (
- [AN #55] Regulatory markets and international standards as a means of ensuring beneficial AI by 5 May 2019 2:20 UTC; 17 points) (
- Alignment Newsletter #31 by 5 Nov 2018 23:50 UTC; 17 points) (
- [AN #83]: Sample-efficient deep learning with ReMixMatch by 22 Jan 2020 18:10 UTC; 15 points) (
- [AN #79]: Recursive reward modeling as an alignment technique integrated with deep RL by 1 Jan 2020 18:00 UTC; 13 points) (
- 6 Nov 2019 18:03 UTC; 13 points) 's comment on Book Review: Design Principles of Biological Circuits by (
- [AN #143]: How to make embedded agents that reason probabilistically about their environments by 24 Mar 2021 17:20 UTC; 13 points) (
- Comments on Allan Dafoe on AI Governance by 29 Nov 2021 16:16 UTC; 13 points) (
- 25 Mar 2021 0:47 UTC; 12 points) 's comment on My research methodology by (
- [AN #66]: Decomposing robustness into capability robustness and alignment robustness by 30 Sep 2019 18:00 UTC; 12 points) (
- 31 Dec 2018 23:54 UTC; 12 points) 's comment on New safety research agenda: scalable agent alignment via reward modeling by (
- Towards deconfusing values by 29 Jan 2020 19:28 UTC; 12 points) (
- Beyond Rewards and Values: A Non-dualistic Approach to Universal Intelligence by 30 Dec 2022 19:05 UTC; 10 points) (
- What are brains? by 10 Jun 2023 14:46 UTC; 10 points) (
- 20 Sep 2019 22:57 UTC; 8 points) 's comment on Reframing Impact by (
- 21 Dec 2021 14:54 UTC; 7 points) 's comment on Attainable Utility Preservation: Empirical Results by (
- ACI#8: Value as a Function of Possible Worlds by 3 Jun 2024 21:49 UTC; 6 points) (
- Choice := Anthropics uncertainty? And potential implications for agency by 21 Apr 2022 16:38 UTC; 6 points) (
- 30 Oct 2018 18:35 UTC; 5 points) 's comment on An Undergraduate Reading Of: Semantic information, autonomous agency and non-equilibrium statistical physics by (
- SlateStarCodex Fika by 2 Jan 2021 2:03 UTC; 3 points) (
- Decisions with Non-Logical Counterfactuals: request for input by 24 Oct 2019 17:23 UTC; 3 points) (
- 23 Jan 2020 17:21 UTC; 2 points) 's comment on (A → B) → A in Causal DAGs by (
- 12 Jan 2021 5:58 UTC; 2 points) 's comment on What are the open problems in Human Rationality? by (
- 30 Dec 2019 5:41 UTC; 2 points) 's comment on Stupidity and Dishonesty Explain Each Other Away by (
- 4 Jun 2023 2:13 UTC; 1 point) 's comment on Stephen Fowler’s Shortform by (
- 7 May 2023 15:41 UTC; 1 point) 's comment on awg’s Shortform by (
- 19 Jul 2022 2:09 UTC; 1 point) 's comment on All AGI safety questions welcome (especially basic ones) [July 2022] by (
I actually have some understanding of what MIRI’s Agent Foundations work is about
This post (and the rest of the sequence) was the first time I had ever read something about AI alignment and thought that it was actually asking the right questions. It is not about a sub-problem, it is not about marginal improvements. Its goal is a gears-level understanding of agents, and it directly explains why that’s hard. It’s a list of everything which needs to be figured out in order to remove all the black boxes and Cartesian boundaries, and understand agents as well as we understand refrigerators.
I nominate this post for two reasons.
One, it is an excellent example of providing supplemental writing about basic intuitions and thought processes, which is extremely helpful to me because I do not have a good enough command of the formal work to intuit them.
Two, it is one of the few examples of experimenting with different kinds of presentation. I feel like this is underappreciated and under-utilized; better ways of communicating seems like a strong baseline requirement of the rationality project, and this post pushes in that direction.
This post has significant changed my mental model of how to understand key challenges in AI safety, and also given me a clearer understanding of and language for describing why complex game-theoretic challenges are poorly specified or understood. The terms and concepts in this series of posts have become a key part of my basic intellectual toolkit.
This sequence was the first time I felt I understood MIRI’s research.
(Though I might prefer to nominate the text-version that has the whole sequence in one post.)
Read sequence as research for my EA/rationality novel, this was really good and also pretty easy to follow despite not having any technical background