How to Make Superbabies

Feb 19, 2025, 8:39 PM
591 points
337 comments31 min readLW link

How AI Takeover Might Hap­pen in 2 Years

joshcFeb 7, 2025, 5:10 PM
416 points
137 comments29 min readLW link
(x.com)

Emer­gent Misal­ign­ment: Nar­row fine­tun­ing can pro­duce broadly mis­al­igned LLMs

Feb 25, 2025, 5:39 PM
328 points
91 comments4 min readLW link

Mur­der plots are infohazards

Chris MonteiroFeb 13, 2025, 7:15 PM
300 points
44 comments2 min readLW link

So You Want To Make Marginal Progress...

johnswentworthFeb 7, 2025, 11:22 PM
284 points
42 comments4 min readLW link

Ar­bital has been im­ported to LessWrong

Feb 20, 2025, 12:47 AM
279 points
30 comments5 min readLW link

A His­tory of the Fu­ture, 2025-2040

L Rudolf LFeb 17, 2025, 12:03 PM
231 points
41 comments75 min readLW link
(nosetgauge.substack.com)

Power Lies Trem­bling: a three-book review

Richard_NgoFeb 22, 2025, 10:57 PM
211 points
27 comments15 min readLW link
(www.mindthefuture.info)

Why Did Elon Musk Just Offer to Buy Con­trol of OpenAI for $100 Billion?

garrisonFeb 11, 2025, 12:20 AM
208 points
8 commentsLW link
(garrisonlovely.substack.com)

Eliezer’s Lost Align­ment Ar­ti­cles /​ The Ar­bital Sequence

Feb 20, 2025, 12:48 AM
207 points
9 comments5 min readLW link

[Question] Have LLMs Gen­er­ated Novel In­sights?

Feb 23, 2025, 6:22 PM
155 points
36 comments2 min readLW link

It’s been ten years. I pro­pose HPMOR An­niver­sary Par­ties.

ScrewtapeFeb 16, 2025, 1:43 AM
153 points
3 comments1 min readLW link

Levels of Friction

ZviFeb 10, 2025, 1:10 PM
148 points
8 comments12 min readLW link
(thezvi.wordpress.com)

The Sorry State of AI X-Risk Ad­vo­cacy, and Thoughts on Do­ing Better

Thane RuthenisFeb 21, 2025, 8:15 PM
148 points
51 comments6 min readLW link

A com­pu­ta­tional no-co­in­ci­dence principle

Eric NeymanFeb 14, 2025, 9:39 PM
148 points
38 comments6 min readLW link
(www.alignment.org)

The Paris AI Anti-Safety Summit

ZviFeb 12, 2025, 2:00 PM
129 points
21 comments21 min readLW link
(thezvi.wordpress.com)

Grad­ual Disem­pow­er­ment, Shell Games and Flinches

Jan_KulveitFeb 2, 2025, 2:47 PM
126 points
36 comments6 min readLW link

Re­search di­rec­tions Open Phil wants to fund in tech­ni­cal AI safety

Feb 8, 2025, 1:40 AM
116 points
21 comments58 min readLW link
(www.openphilanthropy.org)

The News is Never Neglected

lsusrFeb 11, 2025, 2:59 PM
111 points
18 comments1 min readLW link

Open Philan­thropy Tech­ni­cal AI Safety RFP - $40M Available Across 21 Re­search Areas

Feb 6, 2025, 6:58 PM
111 points
0 comments1 min readLW link
(www.openphilanthropy.org)

Two hemi­spheres—I do not think it means what you think it means

ViliamFeb 9, 2025, 3:33 PM
108 points
21 comments14 min readLW link

You can just wear a suit

lsusrFeb 26, 2025, 2:57 PM
108 points
48 comments2 min readLW link

My model of what is go­ing on with LLMs

Cole WyethFeb 13, 2025, 3:43 AM
104 points
49 comments7 min readLW link

Judge­ments: Merg­ing Pre­dic­tion & Evidence

abramdemskiFeb 23, 2025, 7:35 PM
103 points
5 comments6 min readLW link

De­tect­ing Strate­gic De­cep­tion Us­ing Lin­ear Probes

Feb 6, 2025, 3:46 PM
102 points
9 comments2 min readLW link
(arxiv.org)

AGI Safety & Align­ment @ Google Deep­Mind is hiring

Rohin ShahFeb 17, 2025, 9:11 PM
102 points
19 comments10 min readLW link

A short course on AGI safety from the GDM Align­ment team

Feb 14, 2025, 3:43 PM
101 points
1 comment1 min readLW link
(deepmindsafetyresearch.medium.com)

C’mon guys, De­liber­ate Prac­tice is Real

RaemonFeb 5, 2025, 10:33 PM
98 points
25 comments9 min readLW link

Ti­maeus in 2024

Feb 20, 2025, 11:54 PM
96 points
1 comment8 min readLW link

Re­view­ing LessWrong: Screw­tape’s Ba­sic Answer

ScrewtapeFeb 5, 2025, 4:30 AM
96 points
18 comments6 min readLW link

Dear AGI,

Nathan YoungFeb 18, 2025, 10:48 AM
90 points
11 comments3 min readLW link

Wired on: “DOGE per­son­nel with ad­min ac­cess to Fed­eral Pay­ment Sys­tem”

RaemonFeb 5, 2025, 9:32 PM
88 points
45 comments2 min readLW link
(web.archive.org)

An­thropic re­leases Claude 3.7 Son­net with ex­tended think­ing mode

LawrenceCFeb 24, 2025, 7:32 PM
88 points
8 comments4 min readLW link
(www.anthropic.com)

The Risk of Grad­ual Disem­pow­er­ment from AI

ZviFeb 5, 2025, 10:10 PM
86 points
15 comments20 min readLW link
(thezvi.wordpress.com)

Vot­ing Re­sults for the 2023 Review

RaemonFeb 6, 2025, 8:00 AM
86 points
3 comments69 min readLW link

How might we safely pass the buck to AI?

joshcFeb 19, 2025, 5:48 PM
83 points
58 comments31 min readLW link

Am­bigu­ous out-of-dis­tri­bu­tion gen­er­al­iza­tion on an al­gorith­mic task

Feb 13, 2025, 6:24 PM
83 points
6 comments11 min readLW link

The Mask Comes Off: A Trio of Tales

ZviFeb 14, 2025, 3:30 PM
81 points
1 comment13 min readLW link
(thezvi.wordpress.com)

Microplas­tics: Much Less Than You Wanted To Know

Feb 15, 2025, 7:08 PM
80 points
8 comments13 min readLW link

[PAPER] Ja­co­bian Sparse Au­toen­coders: Spar­sify Com­pu­ta­tions, Not Just Activations

Lucy FarnikFeb 26, 2025, 12:50 PM
79 points
8 comments7 min readLW link

OpenAI re­leases deep re­search agent

Seth HerdFeb 3, 2025, 12:48 PM
78 points
21 comments3 min readLW link
(openai.com)

Pick two: con­cise, com­pre­hen­sive, or clear rules

ScrewtapeFeb 3, 2025, 6:39 AM
78 points
27 comments8 min readLW link

Eval­u­at­ing “What 2026 Looks Like” So Far

Jonny SpicerFeb 24, 2025, 6:55 PM
77 points
5 comments7 min readLW link

Anti-Slop In­ter­ven­tions?

abramdemskiFeb 4, 2025, 7:50 PM
76 points
33 comments6 min readLW link

The Sim­plest Good

Jesse HooglandFeb 2, 2025, 7:51 PM
75 points
6 comments5 min readLW link

MATS Ap­pli­ca­tions + Re­search Direc­tions I’m Cur­rently Ex­cited About

Neel NandaFeb 6, 2025, 11:03 AM
73 points
7 comments8 min readLW link

Osaka

lsusrFeb 26, 2025, 1:50 PM
72 points
11 comments1 min readLW link

A Prob­lem to Solve Be­fore Build­ing a De­cep­tion Detector

Feb 7, 2025, 7:35 PM
71 points
12 comments14 min readLW link

Ther­mo­dy­namic en­tropy = Kol­mogorov complexity

Aram EbtekarFeb 17, 2025, 5:56 AM
70 points
12 comments1 min readLW link
(doi.org)

Lan­guage Models Use Tri­gonom­e­try to Do Addition

Subhash KantamneniFeb 5, 2025, 1:50 PM
70 points
1 comment10 min readLW link