Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
New
Hot
Active
Old
Page
1
Sorry, I still think kidney donation makes no sense for an EA
nicholashalden
22 Nov 2025 18:10 UTC
−1
points
1
comment
1
min read
LW
link
(substack.com)
Automatic alt text generation
TurnTrout
22 Nov 2025 17:57 UTC
18
points
1
comment
1
min read
LW
link
(turntrout.com)
Introspection in LLMs: A Proposal For How To Think About It, And Test For It
Christopher Ackerman
22 Nov 2025 14:52 UTC
5
points
0
comments
7
min read
LW
link
Book Review: Wizard’s Hall
Screwtape
22 Nov 2025 7:38 UTC
52
points
0
comments
5
min read
LW
link
Be Naughty
habryka
22 Nov 2025 6:35 UTC
57
points
3
comments
4
min read
LW
link
Market Logic I
abramdemski
22 Nov 2025 6:01 UTC
31
points
2
comments
5
min read
LW
link
Animal welfare concerns are dominated by post-ASI futures
RobertM
22 Nov 2025 4:08 UTC
23
points
0
comments
4
min read
LW
link
Habitual mental motions might explain why people are content to get old and die
Ruby
22 Nov 2025 2:52 UTC
16
points
1
comment
7
min read
LW
link
Diplomacy during AI takeoff
Nikola Jurkovic
22 Nov 2025 2:12 UTC
15
points
3
comments
2
min read
LW
link
(nikolajurkovic.substack.com)
Abstract advice to researchers tackling the difficult core problems of AGI alignment
TsviBT
22 Nov 2025 0:53 UTC
90
points
8
comments
8
min read
LW
link
Why Not Just Train For Interpretability?
johnswentworth
21 Nov 2025 22:08 UTC
41
points
5
comments
4
min read
LW
link
Models not making it clear when they’re roleplaying seems like a fairly big issue
williawa
21 Nov 2025 20:23 UTC
16
points
0
comments
6
min read
LW
link
Natural Emergent Misalignment from Reward Hacking
Algon
21 Nov 2025 20:20 UTC
12
points
0
comments
3
min read
LW
link
(www.anthropic.com)
Natural emergent misalignment from reward hacking in production RL
evhub
,
Monte M
,
Benjamin Wright
and
Jonathan Uesato
21 Nov 2025 20:00 UTC
175
points
17
comments
9
min read
LW
link
Eight Heuristics of Anti-Epistemology
Ben Pace
21 Nov 2025 19:54 UTC
21
points
2
comments
6
min read
LW
link
We won’t solve non-alignment problems by doing research
MichaelDickens
21 Nov 2025 18:03 UTC
12
points
2
comments
4
min read
LW
link
Can Artificial Intelligence Be Conscious?
Bentham's Bulldog
21 Nov 2025 16:43 UTC
12
points
3
comments
7
min read
LW
link
Why Does Empathy Have an Off-Switch?
J Bostock
21 Nov 2025 14:56 UTC
9
points
1
comment
7
min read
LW
link
What Do We Tell the Humans? Errors, Hallucinations, and Lies in the AI Village
Shoshannah Tekofsky
21 Nov 2025 14:19 UTC
42
points
0
comments
8
min read
LW
link
Should I Apply to a 3.5% Acceptance-Rate Fellowship? A Simple EV Calculator
Tobias H
21 Nov 2025 10:59 UTC
15
points
0
comments
5
min read
LW
link
Back to top
Next