RSS

Is Bayesi­anism Sus­cep­ti­ble to the Mail-Order Prophet Scam?

Max Kaye31 Mar 2026 19:30 UTC
16 points
16 comments1 min readLW link

Product Align­ment is not Su­per­in­tel­li­gence Align­ment (and we need the lat­ter to sur­vive)

plex31 Mar 2026 16:53 UTC
79 points
5 comments2 min readLW link

Ex­per­i­ments With Opus 4.6′s Fiction

Tomás B.31 Mar 2026 15:38 UTC
34 points
10 comments13 min readLW link

AI Safety Manual

humanityfirst31 Mar 2026 4:21 UTC
1 point
0 comments10 min readLW link

What it’s like to be an AI safety grant­maker (and why we need more of them)

Julian Hazell31 Mar 2026 3:34 UTC
1 point
0 comments11 min readLW link
(thirdthing.ai)

Arcta via est—the Nar­row path out of grad­ual disempowerment

Founder-ArcaFutura31 Mar 2026 3:06 UTC
2 points
2 comments15 min readLW link

Your AI Travel agent would book you a bul­lfight: bench­mark­ing im­plicit an­i­mal com­pas­sion in Agen­tic AI

31 Mar 2026 1:34 UTC
5 points
0 comments5 min readLW link

Slack in Cells, Slack in Brains

Mateusz Bagiński31 Mar 2026 0:35 UTC
37 points
3 comments6 min readLW link

Take note of how bright­ness makes you feel

Adam Zerner30 Mar 2026 23:48 UTC
27 points
5 comments3 min readLW link

A Mir­ror Test For LLMs

Christopher Ackerman30 Mar 2026 22:44 UTC
6 points
0 comments24 min readLW link

God Can Send An Email

AlphaAndOmega30 Mar 2026 22:36 UTC
17 points
12 comments9 min readLW link

On Bad­ness of Death

MarkelKori30 Mar 2026 20:34 UTC
0 points
1 comment3 min readLW link

How to Solve Se­cure Pro­gram Synthesis

30 Mar 2026 20:12 UTC
19 points
0 comments11 min readLW link

Block­ing live failures with syn­chronous monitors

30 Mar 2026 17:44 UTC
22 points
0 comments4 min readLW link
(open.substack.com)

A Guide to the The­ory of Ap­pro­pri­ate­ness Papers

Joel Z. Leibo30 Mar 2026 16:56 UTC
3 points
1 comment1 min readLW link

AI should be a good cit­i­zen, not just a good assistant

30 Mar 2026 14:32 UTC
37 points
11 comments9 min readLW link
(www.forethought.org)

My One-Year-Old Pre­dic­tions for What the World Will Look Like in 3 Years

Ihor Kendiukhov30 Mar 2026 13:56 UTC
12 points
2 comments3 min readLW link

Propo­si­tional Alignment

williawa30 Mar 2026 13:50 UTC
5 points
0 comments2 min readLW link

The state of AI safety in four fake graphs

Boaz Barak30 Mar 2026 13:21 UTC
76 points
30 comments2 min readLW link

(Some) Nat­u­ral Emer­gent Misal­ign­ment from Re­ward Hack­ing in Non-Pro­duc­tion RL

30 Mar 2026 10:56 UTC
105 points
3 comments17 min readLW link