The Case Against AI Con­trol Research

johnswentworthJan 21, 2025, 4:03 PM
352 points
80 comments6 min readLW link

What’s the short timeline plan?

Marius HobbhahnJan 2, 2025, 2:59 PM
351 points
49 comments23 min readLW link

The Gen­tle Romance

Richard_NgoJan 19, 2025, 6:29 PM
231 points
46 comments15 min readLW link
(www.asimov.press)

“Sharp Left Turn” dis­course: An opinionated review

Steven ByrnesJan 28, 2025, 6:47 PM
205 points
26 comments31 min readLW link

Mechanisms too sim­ple for hu­mans to design

MalmesburyJan 22, 2025, 4:54 PM
202 points
45 comments15 min readLW link

In­stru­men­tal Goals Are A Differ­ent And Friendlier Kind Of Thing Than Ter­mi­nal Goals

Jan 24, 2025, 8:20 PM
178 points
61 comments5 min readLW link

What Is The Align­ment Prob­lem?

johnswentworthJan 16, 2025, 1:20 AM
178 points
50 comments25 min readLW link

How will we up­date about schem­ing?

ryan_greenblattJan 6, 2025, 8:21 PM
169 points
20 comments36 min readLW link

Grad­ual Disem­pow­er­ment: Sys­temic Ex­is­ten­tial Risks from In­cre­men­tal AI Development

Jan 30, 2025, 5:03 PM
162 points
57 comments2 min readLW link
(gradual-disempowerment.ai)

Max­i­miz­ing Com­mu­ni­ca­tion, not Traffic

jefftkJan 5, 2025, 1:00 PM
161 points
10 comments1 min readLW link
(www.jefftk.com)

Don’t ig­nore bad vibes you get from people

Kaj_SotalaJan 18, 2025, 9:20 AM
150 points
50 comments2 min readLW link
(kajsotala.fi)

Quotes from the Star­gate press conference

Nikola JurkovicJan 22, 2025, 12:50 AM
149 points
7 comments1 min readLW link
(www.c-span.org)

OpenAI #10: Reflections

ZviJan 7, 2025, 5:00 PM
149 points
7 comments11 min readLW link
(thezvi.wordpress.com)

Cap­i­tal Own­er­ship Will Not Prevent Hu­man Disempowerment

berenJan 5, 2025, 6:00 AM
148 points
18 comments14 min readLW link

Ac­ti­va­tion space in­ter­pretabil­ity may be doomed

Jan 8, 2025, 12:49 PM
147 points
32 comments8 min readLW link

AI com­pa­nies are un­likely to make high-as­surance safety cases if timelines are short

ryan_greenblattJan 23, 2025, 6:41 PM
145 points
5 comments13 min readLW link

Ap­ply­ing tra­di­tional eco­nomic think­ing to AGI: a trilemma

Steven ByrnesJan 13, 2025, 1:23 AM
144 points
32 comments3 min readLW link

Hu­man takeover might be worse than AI takeover

Tom DavidsonJan 10, 2025, 4:53 PM
140 points
55 comments8 min readLW link

Ten peo­ple on the inside

BuckJan 28, 2025, 4:41 PM
139 points
28 comments4 min readLW link

What Indi­ca­tors Should We Watch to Disam­biguate AGI Timelines?

snewmanJan 6, 2025, 7:57 PM
139 points
57 comments13 min readLW link

Plan­ning for Ex­treme AI Risks

joshcJan 29, 2025, 6:33 PM
139 points
5 comments16 min readLW link

[Fic­tion] [Comic] Effec­tive Altru­ism and Ra­tion­al­ity meet at a Sec­u­lar Sols­tice afterparty

tandemJan 7, 2025, 7:11 PM
137 points
5 comments1 min readLW link

Ano­ma­lous To­kens in Deep­Seek-V3 and r1

henryJan 25, 2025, 10:55 PM
136 points
2 comments7 min readLW link

Train­ing on Doc­u­ments About Re­ward Hack­ing In­duces Re­ward Hacking

Jan 21, 2025, 9:32 PM
131 points
15 comments2 min readLW link
(alignment.anthropic.com)

Tell me about your­self: LLMs are aware of their learned behaviors

Jan 22, 2025, 12:47 AM
129 points
5 comments6 min readLW link

Build­ing AI Re­search Fleets

Jan 12, 2025, 6:23 PM
129 points
11 comments5 min readLW link

Park­in­son’s Law and the Ide­ol­ogy of Statistics

BenquoJan 4, 2025, 3:49 PM
127 points
7 comments8 min readLW link
(benjaminrosshoffman.com)

The In­tel­li­gence Curse

lukedragoJan 3, 2025, 7:07 PM
124 points
26 comments18 min readLW link
(lukedrago.substack.com)

2024 in AI predictions

jessicataJan 1, 2025, 8:29 PM
117 points
3 comments8 min readLW link

The Game Board has been Flipped: Now is a good time to re­think what you’re doing

LintzAJan 28, 2025, 11:36 PM
112 points
30 comments13 min readLW link

Aris­toc­racy and Hostage Capital

Arjun PanicksseryJan 8, 2025, 7:38 PM
108 points
7 comments3 min readLW link
(arjunpanickssery.substack.com)

At­tri­bu­tion-based pa­ram­e­ter decomposition

Jan 25, 2025, 1:12 PM
107 points
21 comments4 min readLW link
(publications.apolloresearch.ai)

My su­pervillain ori­gin story

Dmitry VaintrobJan 27, 2025, 12:20 PM
106 points
1 comment5 min readLW link

How do you deal w/​ Su­per Stim­uli?

Logan RiggsJan 14, 2025, 3:14 PM
106 points
25 comments3 min readLW link

Fake think­ing and real thinking

Joe CarlsmithJan 28, 2025, 8:05 PM
106 points
11 comments38 min readLW link

Com­ment on “Death and the Gor­gon”

Zack_M_DavisJan 1, 2025, 5:47 AM
99 points
33 comments8 min readLW link

The pur­pose­ful drunkard

Dmitry VaintrobJan 12, 2025, 12:27 PM
98 points
13 comments6 min readLW link

Rea­sons for and against work­ing on tech­ni­cal AI safety at a fron­tier AI lab

bilalchughtaiJan 5, 2025, 2:49 PM
97 points
12 comments12 min readLW link

On Eat­ing the Sun

jessicataJan 8, 2025, 4:57 AM
94 points
96 comments3 min readLW link
(unstablerontology.substack.com)

We prob­a­bly won’t just play sta­tus games with each other af­ter AGI

Matthew BarnettJan 15, 2025, 4:56 AM
94 points
21 comments4 min readLW link

Tips and Code for Em­piri­cal Re­search Workflows

Jan 20, 2025, 10:31 PM
94 points
14 comments20 min readLW link

The sub­set par­ity learn­ing prob­lem: much more than you wanted to know

Dmitry VaintrobJan 3, 2025, 9:13 AM
93 points
18 comments11 min readLW link

In­tro­duc­ing Squig­gle AI

ozziegooenJan 3, 2025, 5:53 PM
92 points
15 commentsLW link

The Ris­ing Sea

Jesse HooglandJan 25, 2025, 8:48 PM
91 points
2 comments2 min readLW link

Six Thoughts on AI Safety

boazbarakJan 24, 2025, 10:20 PM
91 points
55 comments15 min readLW link

Im­pli­ca­tions of the in­fer­ence scal­ing paradigm for AI safety

Ryan KiddJan 14, 2025, 2:14 AM
91 points
70 comments5 min readLW link

Thoughts on the con­ser­va­tive as­sump­tions in AI control

BuckJan 17, 2025, 7:23 PM
90 points
5 comments13 min readLW link

Tips On Em­piri­cal Re­search Slides

Jan 8, 2025, 5:06 AM
90 points
4 comments6 min readLW link

Agent Foun­da­tions 2025 at CMU

Jan 19, 2025, 11:48 PM
90 points
10 comments1 min readLW link

Five Re­cent AI Tu­tor­ing Studies

Arjun PanicksseryJan 19, 2025, 3:53 AM
88 points
0 comments2 min readLW link
(arjunpanickssery.substack.com)