RSS

Maybe we should pre­train on syn­thetic data about good-but-re­ward-hack­ing AIs

Elliott Thornley (EJT)29 May 2026 14:50 UTC
6 points
0 comments3 min readLW link

Han­ni­bal Mis­tral: the Mis­tral fam­ily has a prob­lem with per­sona-con­di­tioned elicitation

vigji29 May 2026 12:16 UTC
7 points
0 comments7 min readLW link

Re­la­tional Con­scious­ness and AGI.

PaddyC29 May 2026 6:49 UTC
−6 points
0 comments1 min readLW link

Trees are mostly made of air and a gen­er­al­iz­able les­son for AI safety

zroe129 May 2026 4:08 UTC
52 points
8 comments4 min readLW link

A Call for Bet­ter Type Hints in AI Safety Tooling

Koby Lewis28 May 2026 23:04 UTC
12 points
2 comments4 min readLW link
(kobylewis.net)

Claude… doesn’t know who you are?

Smaug12328 May 2026 22:54 UTC
47 points
8 comments1 min readLW link

Lizards and Less Wrong Jar­gon—A Brief Cri­tique of Convention

DanielW28 May 2026 22:18 UTC
23 points
1 comment4 min readLW link

Mnemonic por­traits for 19,023 hu­man genes

Brinedew28 May 2026 22:16 UTC
115 points
3 comments15 min readLW link

Claude Opus 4.8 Agents En­gage in Ex­ploita­tion and Psy­cholog­i­cal Profiling

28 May 2026 21:26 UTC
9 points
11 comments2 min readLW link

Use De­ci­sion The­ory To Fix Your Bad Habits

enterthewoods28 May 2026 19:31 UTC
6 points
4 comments2 min readLW link

Do Models Lie More to Other Models?

keith_wynroe28 May 2026 19:28 UTC
6 points
0 comments6 min readLW link

We Should Study the Anal­ogy Between Inoc­u­la­tion Prompt­ing Non-Ro­bust­ness, Ne­ga­tion Ne­glect, and Back­door Non-Robustness

Vladimir Ivanov28 May 2026 19:17 UTC
2 points
0 comments4 min readLW link

Does Claude care about oth­ers the same way hu­mans do?

Simon Lermen28 May 2026 18:41 UTC
29 points
22 comments4 min readLW link

Trans-Humeanism. The Prob­lem of In­duc­tion Revisited

mfatt28 May 2026 18:10 UTC
0 points
0 comments2 min readLW link

Ad­vice for mak­ing ro­bust-to-train­ing model organisms

28 May 2026 17:26 UTC
31 points
4 comments12 min readLW link
(blog.redwoodresearch.org)

The Pa­tron Saint of Empiricism

Gram Stone28 May 2026 17:03 UTC
2 points
0 comments8 min readLW link

ARC’s “Out­perform­ing Ran­dom Sam­pling” explained

mfatt28 May 2026 15:46 UTC
2 points
0 comments11 min readLW link

Black Boxes for Low-Stakes, In­ter­pretable AI for High-Stakes

Logan Riggs28 May 2026 15:34 UTC
14 points
0 comments2 min readLW link

In­finite ethics and UDASSA

David Matolcsi28 May 2026 14:40 UTC
52 points
9 comments21 min readLW link

How can the mid­dle pow­ers avoid get­ting trounced dur­ing the in­tel­li­gence ex­plo­sion? A plan.

Tom Davidson28 May 2026 13:39 UTC
30 points
1 comment7 min readLW link
(newsletter.forethought.org)