RSS

On rev­olu­tion­ary love in AI safety

Troy Tian22 Jun 2026 3:48 UTC
2 points
0 comments4 min readLW link

Do AI Biorisk Thresh­olds Need In­ter­me­di­ate Warn­ing Levels?

Lukas Frei22 Jun 2026 1:09 UTC
9 points
0 comments3 min readLW link

NLA ex­pla­na­tions can be short­ened with­out harm­ing reconstruction

loops22 Jun 2026 0:57 UTC
22 points
2 comments3 min readLW link

In­tro­duc­ing MonitoringBench

monika_j21 Jun 2026 18:43 UTC
33 points
0 comments6 min readLW link

How per­sona train­ing could fail

Simon Lermen21 Jun 2026 16:38 UTC
12 points
0 comments4 min readLW link

A high-level model of AI bargaining

Anthony DiGiovanni21 Jun 2026 15:37 UTC
11 points
1 comment5 min readLW link

Policy changes should be rol­led out gradually

Yair Halberstadt21 Jun 2026 11:07 UTC
24 points
2 comments3 min readLW link

A mis­al­ign­ment taxonomy

Alec Harris21 Jun 2026 10:20 UTC
12 points
2 comments3 min readLW link

The Cookie Mon­ster Ex­plains AI Safety

michaelwaves21 Jun 2026 0:52 UTC
12 points
2 comments2 min readLW link

How are there 0 stud­ies (maybe 1) on sex-con­cor­dant hor­mone ther­apy?

Util20 Jun 2026 22:36 UTC
14 points
0 comments3 min readLW link

Against Planet-Eat­ing Nanoreplicators

SurvivalBias20 Jun 2026 20:27 UTC
10 points
7 comments5 min readLW link

How trans­par­ent is Diffu­sionGemma (and why it mat­ters)

20 Jun 2026 20:05 UTC
71 points
2 comments4 min readLW link

The In­visi­ble Side of AI Governance

Charbel-Raphaël20 Jun 2026 18:54 UTC
94 points
4 comments14 min readLW link

Would any­body here be in­ter­ested in a “mis­take post­mortem” dis­cus­sion group?

SK220 Jun 2026 12:03 UTC
47 points
7 comments4 min readLW link

The LLM shog­goth meme is weirder than you think

HedonicEscalator19 Jun 2026 23:35 UTC
126 points
8 comments7 min readLW link
(hedonicescalator.substack.com)

How I think de­vel­op­ers of fron­tier AI sys­tems and reg­u­la­tors ought to act in the face of ex­is­ten­tial AI risk

WilliamKiely19 Jun 2026 22:22 UTC
12 points
0 comments12 min readLW link

Hyper­sti­tion as the Nat­u­ral Enemy of Rationality

alseph19 Jun 2026 21:12 UTC
29 points
8 comments3 min readLW link

World-mod­el­ing the US vs. An­thropic Stand­off on Claude Fable

dschwarz19 Jun 2026 20:04 UTC
20 points
4 comments8 min readLW link

Thoughts on Like­li­hood of Ex­is­ten­tial Risks by Misal­igned AIs

Ishan Khire19 Jun 2026 19:17 UTC
3 points
0 comments6 min readLW link
(ishankhire.substack.com)

Why should AI be moral?

Zach Thornton19 Jun 2026 19:13 UTC
12 points
3 comments9 min readLW link