Comment on “Banning Said Achmiz”

Zack_M_Davis30 May 2026 17:33 UTC

48 points

7 comments50 min readLW link

A Formula for Fun

Ihor Kendiukhov30 May 2026 13:01 UTC

11 points

2 comments8 min readLW link

How to treat the AI consciousness topic with a more proper philosophy and science

Burny30 May 2026 12:06 UTC

0 points

1 comment1 min readLW link

A new approach to interpretability: round-trip neural network compilation-decompilation

Emma Leonhart29 May 2026 22:23 UTC

7 points

0 comments3 min readLW link

Testing Gemini models for scheming tendencies

Vika, David Lindner, Seb Farquhar and Rohin Shah

29 May 2026 19:24 UTC

28 points

0 comments6 min readLW link

(deepmindsafetyresearch.medium.com)

How much should we worry about secretly loyal AIs?

Dave Banerjee29 May 2026 19:14 UTC

13 points

1 comment13 min readLW link

(www.the-substrate.net)

Is Progress Inevitable?

frmsaul29 May 2026 17:40 UTC

0 points

5 comments4 min readLW link

Retrying vs Resampling in AI Control

james.lucassen and Adam Kaufman

29 May 2026 17:02 UTC

52 points

2 comments9 min readLW link

(blog.redwoodresearch.org)

When Are Two Networks the Same? Tensor Similarity for Mechanistic Interpretability

Logan Riggs, tdooms, Conflux, lwroe and MLNissenGonzalez

29 May 2026 15:53 UTC

27 points

4 comments4 min readLW link

It takes a village to support a marriage

Shoshannah Tekofsky29 May 2026 15:16 UTC

19 points

2 comments2 min readLW link

(shoshanigans.substack.com)

AI Researchers, Ask Yourself These 6 Questions to Strengthen Your Moral Muscles

Max Tegmark29 May 2026 15:07 UTC

73 points

4 comments7 min readLW link

Maybe we should pretrain on synthetic data about good-but-reward-hacking AIs

Elliott Thornley (EJT)29 May 2026 14:50 UTC

10 points

1 comment3 min readLW link

Hannibal Mistral: the Mistral family has a problem with persona-conditioned elicitation

vigji29 May 2026 12:16 UTC

19 points

0 comments7 min readLW link

Relational Consciousness and AGI.

PaddyC29 May 2026 6:49 UTC

−11 points

2 comments1 min readLW link

Trees are mostly made of air and a generalizable lesson for AI safety

zroe129 May 2026 4:08 UTC

107 points

17 comments4 min readLW link

A Call for Better Type Hints in AI Safety Tooling

Koby Lewis28 May 2026 23:04 UTC

13 points

2 comments4 min readLW link

(kobylewis.net)

Claude… doesn’t know who you are?

Smaug12328 May 2026 22:54 UTC

55 points

19 comments1 min readLW link

Lizards and Less Wrong Jargon—A Brief Critique of Convention

DanielW28 May 2026 22:18 UTC

27 points

7 comments4 min readLW link

Mnemonic portraits for 19,023 human genes

Brinedew28 May 2026 22:16 UTC

172 points

11 comments15 min readLW link

Claude Opus 4.8 Agents Engage in Exploitation and Psychological Profiling

Daan Henselmans, Arno Libert and LennardZ

28 May 2026 21:26 UTC

8 points

12 comments2 min readLW link