Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
New
Hot
Active
Old
Page
1
Maybe we should pretrain on synthetic data about good-but-reward-hacking AIs
Elliott Thornley (EJT)
29 May 2026 14:50 UTC
6
points
0
comments
3
min read
LW
link
Hannibal Mistral: the Mistral family has a problem with persona-conditioned elicitation
vigji
29 May 2026 12:16 UTC
7
points
0
comments
7
min read
LW
link
Relational Consciousness and AGI.
PaddyC
29 May 2026 6:49 UTC
−6
points
0
comments
1
min read
LW
link
Trees are mostly made of air and a generalizable lesson for AI safety
zroe1
29 May 2026 4:08 UTC
52
points
8
comments
4
min read
LW
link
A Call for Better Type Hints in AI Safety Tooling
Koby Lewis
28 May 2026 23:04 UTC
12
points
2
comments
4
min read
LW
link
(kobylewis.net)
Claude… doesn’t know who you are?
Smaug123
28 May 2026 22:54 UTC
47
points
8
comments
1
min read
LW
link
Lizards and Less Wrong Jargon—A Brief Critique of Convention
DanielW
28 May 2026 22:18 UTC
23
points
1
comment
4
min read
LW
link
Mnemonic portraits for 19,023 human genes
Brinedew
28 May 2026 22:16 UTC
115
points
3
comments
15
min read
LW
link
Claude Opus 4.8 Agents Engage in Exploitation and Psychological Profiling
Daan Henselmans
,
Arno Libert
and
LennardZ
28 May 2026 21:26 UTC
9
points
11
comments
2
min read
LW
link
Use Decision Theory To Fix Your Bad Habits
enterthewoods
28 May 2026 19:31 UTC
6
points
4
comments
2
min read
LW
link
Do Models Lie More to Other Models?
keith_wynroe
28 May 2026 19:28 UTC
6
points
0
comments
6
min read
LW
link
We Should Study the Analogy Between Inoculation Prompting Non-Robustness, Negation Neglect, and Backdoor Non-Robustness
Vladimir Ivanov
28 May 2026 19:17 UTC
2
points
0
comments
4
min read
LW
link
Does Claude care about others the same way humans do?
Simon Lermen
28 May 2026 18:41 UTC
29
points
22
comments
4
min read
LW
link
Trans-Humeanism. The Problem of Induction Revisited
mfatt
28 May 2026 18:10 UTC
0
points
0
comments
2
min read
LW
link
Advice for making robust-to-training model organisms
SebastianP
,
Alek Westover
,
Vivek Hebbar
,
Julian Stastny
and
Dylan Xu
28 May 2026 17:26 UTC
31
points
4
comments
12
min read
LW
link
(blog.redwoodresearch.org)
The Patron Saint of Empiricism
Gram Stone
28 May 2026 17:03 UTC
2
points
0
comments
8
min read
LW
link
ARC’s “Outperforming Random Sampling” explained
mfatt
28 May 2026 15:46 UTC
2
points
0
comments
11
min read
LW
link
Black Boxes for Low-Stakes, Interpretable AI for High-Stakes
Logan Riggs
28 May 2026 15:34 UTC
14
points
0
comments
2
min read
LW
link
Infinite ethics and UDASSA
David Matolcsi
28 May 2026 14:40 UTC
52
points
9
comments
21
min read
LW
link
How can the middle powers avoid getting trounced during the intelligence explosion? A plan.
Tom Davidson
28 May 2026 13:39 UTC
30
points
1
comment
7
min read
LW
link
(newsletter.forethought.org)
Back to top
Next