RSS

Com­ment on “Ban­ning Said Ach­miz”

Zack_M_Davis30 May 2026 17:33 UTC
48 points
7 comments50 min readLW link

A For­mula for Fun

Ihor Kendiukhov30 May 2026 13:01 UTC
11 points
2 comments8 min readLW link

How to treat the AI con­scious­ness topic with a more proper philos­o­phy and science

Burny30 May 2026 12:06 UTC
0 points
1 comment1 min readLW link

A new ap­proach to in­ter­pretabil­ity: round-trip neu­ral net­work com­pila­tion-decompilation

Emma Leonhart29 May 2026 22:23 UTC
7 points
0 comments3 min readLW link

Test­ing Gem­ini mod­els for schem­ing tendencies

29 May 2026 19:24 UTC
28 points
0 comments6 min readLW link
(deepmindsafetyresearch.medium.com)

How much should we worry about se­cretly loyal AIs?

Dave Banerjee29 May 2026 19:14 UTC
13 points
1 comment13 min readLW link
(www.the-substrate.net)

Is Progress Inevitable?

frmsaul29 May 2026 17:40 UTC
0 points
5 comments4 min readLW link

Retry­ing vs Re­sam­pling in AI Control

29 May 2026 17:02 UTC
52 points
2 comments9 min readLW link
(blog.redwoodresearch.org)

When Are Two Net­works the Same? Ten­sor Similar­ity for Mechanis­tic Interpretability

29 May 2026 15:53 UTC
27 points
4 comments4 min readLW link

It takes a village to sup­port a marriage

Shoshannah Tekofsky29 May 2026 15:16 UTC
19 points
2 comments2 min readLW link
(shoshanigans.substack.com)

AI Re­searchers, Ask Your­self Th­ese 6 Ques­tions to Strengthen Your Mo­ral Muscles

Max Tegmark29 May 2026 15:07 UTC
73 points
4 comments7 min readLW link

Maybe we should pre­train on syn­thetic data about good-but-re­ward-hack­ing AIs

Elliott Thornley (EJT)29 May 2026 14:50 UTC
10 points
1 comment3 min readLW link

Han­ni­bal Mis­tral: the Mis­tral fam­ily has a prob­lem with per­sona-con­di­tioned elicitation

vigji29 May 2026 12:16 UTC
19 points
0 comments7 min readLW link

Re­la­tional Con­scious­ness and AGI.

PaddyC29 May 2026 6:49 UTC
−11 points
2 comments1 min readLW link

Trees are mostly made of air and a gen­er­al­iz­able les­son for AI safety

zroe129 May 2026 4:08 UTC
107 points
17 comments4 min readLW link

A Call for Bet­ter Type Hints in AI Safety Tooling

Koby Lewis28 May 2026 23:04 UTC
13 points
2 comments4 min readLW link
(kobylewis.net)

Claude… doesn’t know who you are?

Smaug12328 May 2026 22:54 UTC
55 points
19 comments1 min readLW link

Lizards and Less Wrong Jar­gon—A Brief Cri­tique of Convention

DanielW28 May 2026 22:18 UTC
27 points
7 comments4 min readLW link

Mnemonic por­traits for 19,023 hu­man genes

Brinedew28 May 2026 22:16 UTC
172 points
11 comments15 min readLW link

Claude Opus 4.8 Agents En­gage in Ex­ploita­tion and Psy­cholog­i­cal Profiling

28 May 2026 21:26 UTC
8 points
12 comments2 min readLW link