Trust
Rule Thinkers In, Not Out
Scott Alexander
Gears vs Behavior
John S. Wentworth
Book Review: The Secret Of Our Success
Reason isn’t magic
Ben Hoffman
“Other people are wrong” vs “I am right”
Buck Shlegeris
In My Culture
Duncan Sabien
Chris Olah’s views on AGI safety
Evan Hubinger
Understanding “Deep Double Descent”
How to Ignore Your Emotions (while also thinking you’re awesome at emotions)
Hazard
Paper-Reading for Gears
Book summary: Unlocking the Emotional Brain
Kaj Sotala
Noticing Frame Differences
Raymond Arnold
Propagating Facts into Aesthetics
Do you fear the rock or the hard place?
Ruben Bloom
Mental Mountains
Steelmanning Divination
Vaniver
Modularity
Book Review: Design Principles of Biological Circuits
Reframing Superintelligence: Comprehensive AI Services as General Intelligence
Rohin M. Shah
Building up to an Internal Family Systems model
Being the (Pareto) Best in the World
The Schelling Choice is “Rabbit”, not “Stag”
Literature Review: Distributed Teams
Elizabeth Van Nostrand
Gears-Level Models are Capital Investments
Evolution of Modularity
You Have About Five Words
Coherent decisions imply consistent utilities
Eliezer Yudkowsky
Alignment Research Field Guide
Abram Demski
Forum participation as a research strategy
Wei Dai
The Credit Assignment Problem
Selection vs Control
Incentives
Asymmetric Justice
Zvi Mowshowitz
The Copenhagen Interpretation of Ethics
Jai Dhyani
Unconscious Economics
Jacob Lagerros
Power Buys You Distance From The Crime
Seeking Power is Often Convergently Instrumental in MDPs
Alexander Turner & Logan Smith
Yes Requires the Possibility of No
Scott Garrabrant
Mistakes with Conservation of Expected Evidence
Heads I Win,Tails?—Never Heard of Her; Or, Selective Reporting and the Tragedy of the Green Rationalists
Zack M. Davis
Excerpts from a larger discussion about simulacra
Moloch Hasn’t Won
Integrity and accountability are core parts of rationality
Oliver Habryka
The Real Rules Have No Exceptions
Said Achmiz
Simple Rules of Law
The Amish, and Strategic Norms around Technology
Risks from Learned Optimization: Introduction
Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, & Scott Garrabrant
Gradient hacking
Failure
The Parable of Predict-O-Matic
Blackmail
Bioinfohazards
Megan Crawford, Finan Adamson, & Jeffrey Ladish
What failure looks like
Paul Christiano
AI Safety “Success Stories”
Reframing Impact
Alexander Turner
The strategy-stealing assumption
Is Rationalist Self-Improvement Real?
Jacob Falkovich
The Curse Of The Counterfactual
P.J. Eby
human psycholinguists: a critical appraisal
Nostalgebraist
Why wasn’t science invented in China?
Make more land
Jeff Kaufman
Rest Days vs Zombie Days
Lauren Lee
Here is a google sheet.
Trust
Rule Thinkers In, Not Out
Scott Alexander
Gears vs Behavior
John S. Wentworth
Book Review: The Secret Of Our Success
Scott Alexander
Reason isn’t magic
Ben Hoffman
“Other people are wrong” vs “I am right”
Buck Shlegeris
In My Culture
Duncan Sabien
Chris Olah’s views on AGI safety
Evan Hubinger
Understanding “Deep Double Descent”
Evan Hubinger
How to Ignore Your Emotions (while also thinking you’re awesome at emotions)
Hazard
Paper-Reading for Gears
John S. Wentworth
Book summary: Unlocking the Emotional Brain
Kaj Sotala
Noticing Frame Differences
Raymond Arnold
Propagating Facts into Aesthetics
Raymond Arnold
Do you fear the rock or the hard place?
Ruben Bloom
Mental Mountains
Scott Alexander
Steelmanning Divination
Vaniver
Modularity
Book Review: Design Principles of Biological Circuits
John S. Wentworth
Reframing Superintelligence: Comprehensive AI Services as General Intelligence
Rohin M. Shah
Building up to an Internal Family Systems model
Kaj Sotala
Being the (Pareto) Best in the World
John S. Wentworth
The Schelling Choice is “Rabbit”, not “Stag”
Raymond Arnold
Literature Review: Distributed Teams
Elizabeth Van Nostrand
Gears-Level Models are Capital Investments
John S. Wentworth
Evolution of Modularity
John S. Wentworth
You Have About Five Words
Raymond Arnold
Coherent decisions imply consistent utilities
Eliezer Yudkowsky
Alignment Research Field Guide
Abram Demski
Forum participation as a research strategy
Wei Dai
The Credit Assignment Problem
Abram Demski
Selection vs Control
Abram Demski
Incentives
Asymmetric Justice
Zvi Mowshowitz
The Copenhagen Interpretation of Ethics
Jai Dhyani
Unconscious Economics
Jacob Lagerros
Power Buys You Distance From The Crime
Elizabeth Van Nostrand
Seeking Power is Often Convergently Instrumental in MDPs
Alexander Turner & Logan Smith
Yes Requires the Possibility of No
Scott Garrabrant
Mistakes with Conservation of Expected Evidence
Abram Demski
Heads I Win,Tails?—Never Heard of Her; Or, Selective Reporting and the Tragedy of the Green Rationalists
Zack M. Davis
Excerpts from a larger discussion about simulacra
Ben Hoffman
Moloch Hasn’t Won
Zvi Mowshowitz
Integrity and accountability are core parts of rationality
Oliver Habryka
The Real Rules Have No Exceptions
Said Achmiz
Simple Rules of Law
Zvi Mowshowitz
The Amish, and Strategic Norms around Technology
Raymond Arnold
Risks from Learned Optimization: Introduction
Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, & Scott Garrabrant
Gradient hacking
Evan Hubinger
Failure
The Parable of Predict-O-Matic
Abram Demski
Blackmail
Zvi Mowshowitz
Bioinfohazards
Megan Crawford, Finan Adamson, & Jeffrey Ladish
What failure looks like
Paul Christiano
Seeking Power is Often Convergently Instrumental in MDPs
Alexander Turner & Logan Smith
AI Safety “Success Stories”
Wei Dai
Reframing Impact
Alexander Turner
The strategy-stealing assumption
Paul Christiano
Is Rationalist Self-Improvement Real?
Jacob Falkovich
The Curse Of The Counterfactual
P.J. Eby
human psycholinguists: a critical appraisal
Nostalgebraist
Why wasn’t science invented in China?
Ruben Bloom
Make more land
Jeff Kaufman
Rest Days vs Zombie Days
Lauren Lee
Here is a google sheet.