Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
New
Hot
Active
Old
Page
1
AI companies should publish security assessments
ryan_greenblatt
27 Apr 2026 14:39 UTC
41
points
1
comment
3
min read
LW
link
In defense of parents
Yair Halberstadt
27 Apr 2026 14:18 UTC
34
points
1
comment
6
min read
LW
link
Curious cases of financial engineering in biotech
Abhishaike Mahajan
27 Apr 2026 14:09 UTC
18
points
0
comments
22
min read
LW
link
(www.owlposting.com)
Blackmail at 8 Billion Parameters: Agentic Misalignment in Sub-Frontier Models
Chijioke Ugwuanyi
27 Apr 2026 8:59 UTC
12
points
0
comments
7
min read
LW
link
The other paper that killed deep learning theory
LawrenceC
27 Apr 2026 6:57 UTC
34
points
1
comment
8
min read
LW
link
AI might surprise itself by going rogue
David Scott Krueger
27 Apr 2026 6:30 UTC
6
points
0
comments
2
min read
LW
link
(therealartificialintelligence.substack.com)
How does Reinforcement Learning Affect Models
humanityfirst
27 Apr 2026 5:22 UTC
2
points
1
comment
2
min read
LW
link
Retrospective on my unsupervised elicitation challenge
DanielFilan
27 Apr 2026 0:30 UTC
52
points
0
comments
8
min read
LW
link
(danielfilan.com)
Alignment Faking Replication and Chain-of-Thought Monitoring Extensions
Angela Tang
26 Apr 2026 23:55 UTC
7
points
0
comments
8
min read
LW
link
Training a Transformer to Compose One Step Per Layer (and Proving It)
Brendan Long
26 Apr 2026 23:45 UTC
16
points
0
comments
7
min read
LW
link
AI for life strategy advice: a personal experiment
Jonah Wilberg
26 Apr 2026 22:18 UTC
10
points
2
comments
6
min read
LW
link
Spontaneous introspection in output tampering
Ziqian Zhong
26 Apr 2026 20:05 UTC
12
points
0
comments
12
min read
LW
link
How do intentional secret loyalties differ from other schemer motivations?
Cleo Nardo
26 Apr 2026 20:03 UTC
24
points
1
comment
12
min read
LW
link
Control protocols don’t always need to know which models are scheming
Fabien Roger
26 Apr 2026 19:16 UTC
38
points
1
comment
6
min read
LW
link
“Bad faith” means intentionally misrepresenting your beliefs
TFD
26 Apr 2026 19:07 UTC
35
points
15
comments
6
min read
LW
link
Me, decay
Dentosal
26 Apr 2026 17:14 UTC
8
points
1
comment
3
min read
LW
link
Universes can specialize: Each universe should produce the goods it’s most comparatively advantaged at, relative to the multiversal market
Zach Stein-Perlman
26 Apr 2026 16:30 UTC
12
points
12
comments
2
min read
LW
link
Why did people miss the point on Mythos?
draganover
26 Apr 2026 12:15 UTC
44
points
13
comments
5
min read
LW
link
Roko’s Basilisk may work on humans
Horosphere
26 Apr 2026 9:40 UTC
1
point
4
comments
18
min read
LW
link
Substrate: Formalism
Vardhan
and
mfatt
26 Apr 2026 8:06 UTC
2
points
0
comments
10
min read
LW
link
Back to top
Next