Power Lies Trembling: a three-book review

Richard_NgoFeb 22, 2025, 10:57 PM

214 points

29 comments15 min readLW link

(www.mindthefuture.info)

Eliezer’s Lost Alignment Articles / The Arbital Sequence

Ruby and RobertM

Feb 20, 2025, 12:48 AM

207 points

10 comments5 min readLW link

How to Make Superbabies

GeneSmith and kman

Feb 19, 2025, 8:39 PM

605 points

349 comments31 min readLW link

Levels of Friction

ZviFeb 10, 2025, 1:10 PM

148 points

8 comments12 min readLW link

(thezvi.wordpress.com)

So You Want To Make Marginal Progress...

johnswentworthFeb 7, 2025, 11:22 PM

286 points

42 comments4 min readLW link

How AI Takeover Might Happen in 2 Years

joshcFeb 7, 2025, 5:10 PM

422 points

137 comments29 min readLW link

(x.com)

Some articles in “International Security” that I enjoyed

BuckJan 31, 2025, 4:23 PM

130 points

10 comments4 min readLW link

“Sharp Left Turn” discourse: An opinionated review

Steven ByrnesJan 28, 2025, 6:47 PM

208 points

26 comments31 min readLW link

The Case Against AI Control Research

johnswentworthJan 21, 2025, 4:03 PM

353 points

80 comments6 min readLW link

The Gentle Romance

Richard_NgoJan 19, 2025, 6:29 PM

242 points

46 comments15 min readLW link

(www.asimov.press)

Don’t ignore bad vibes you get from people

Kaj_SotalaJan 18, 2025, 9:20 AM

152 points

50 comments2 min readLW link

(kajsotala.fi)

What Is The Alignment Problem?

johnswentworthJan 16, 2025, 1:20 AM

180 points

50 comments25 min readLW link

How will we update about scheming?

ryan_greenblattJan 6, 2025, 8:21 PM

171 points

20 comments37 min readLW link

Review: Planecrash

L Rudolf LDec 27, 2024, 2:18 PM

360 points

45 comments22 min readLW link

(nosetgauge.substack.com)

A Three-Layer Model of LLM Psychology

Jan_KulveitDec 26, 2024, 4:49 PM

217 points

13 comments8 min readLW link

What Goes Without Saying

sarahconstantinDec 20, 2024, 6:00 PM

334 points

28 comments5 min readLW link

(sarahconstantin.substack.com)

When Is Insurance Worth It?

kqrDec 19, 2024, 7:07 PM

175 points

71 comments4 min readLW link

(entropicthoughts.com)

Alignment Faking in Large Language Models

ryan_greenblatt, evhub, Carson Denison, Benjamin Wright, Fabien Roger, Monte M, Sam Marks, Johannes Treutlein, Sam Bowman and Buck

Dec 18, 2024, 5:19 PM

483 points

75 comments10 min readLW link

AIs Will Increasingly Attempt Shenanigans

ZviDec 16, 2024, 3:20 PM

114 points

2 comments26 min readLW link

(thezvi.wordpress.com)

Biological risk from the mirror world

jasoncrawfordDec 12, 2024, 7:07 PM

334 points

38 comments7 min readLW link

(newsletter.rootsofprogress.org)