Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
New
Hot
Active
Old
Page
1
Prioritizing Work
jefftk
May 1, 2025, 2:00 AM
38
points
1
comment
1
min read
LW
link
(www.jefftk.com)
Don’t rely on a “race to the top”
sjadler
May 1, 2025, 12:33 AM
4
points
0
comments
1
min read
LW
link
Meta-Technicalities: Safeguarding Values in Formal Systems
LTM
Apr 30, 2025, 11:43 PM
2
points
0
comments
3
min read
LW
link
(routecause.substack.com)
Obstacles in ARC’s agenda: Finding explanations
David Matolcsi
Apr 30, 2025, 11:03 PM
66
points
1
comment
17
min read
LW
link
State of play of AI progress (and related brakes on an intelligence explosion) [Linkpost]
Noosphere89
Apr 30, 2025, 7:58 PM
7
points
0
comments
5
min read
LW
link
(www.interconnects.ai)
Don’t accuse your interlocutor of being insufficiently truth-seeking
TFD
Apr 30, 2025, 7:38 PM
18
points
9
comments
2
min read
LW
link
(www.thefloatingdroid.com)
How can we solve diffuse threats like research sabotage with AI control?
Vivek Hebbar
Apr 30, 2025, 7:23 PM
34
points
0
comments
8
min read
LW
link
[Question]
Can Narrowing One’s Reference Class Undermine the Doomsday Argument?
Iannoose n.
Apr 30, 2025, 6:24 PM
2
points
0
comments
1
min read
LW
link
[Question]
Does there exist an interactive reasoning map tool that lets users visually lay out claims, assign probabilities and confidence levels, and dynamically adjust their beliefs based on weighted influences between connected assertions?
Zack Friedman
Apr 30, 2025, 6:22 PM
3
points
0
comments
1
min read
LW
link
Distilling the Internal Model Principle part II
JoseFaustino
Apr 30, 2025, 5:56 PM
13
points
0
comments
19
min read
LW
link
Research Priorities for Hardware-Enabled Mechanisms (HEMs)
aog
Apr 30, 2025, 5:43 PM
16
points
2
comments
15
min read
LW
link
(www.longview.org)
Video and transcript of talk on automating alignment research
Joe Carlsmith
Apr 30, 2025, 5:43 PM
21
points
0
comments
24
min read
LW
link
(joecarlsmith.com)
Can we safely automate alignment research?
Joe Carlsmith
Apr 30, 2025, 5:37 PM
32
points
6
comments
48
min read
LW
link
(joecarlsmith.com)
Investigating task-specific prompts and sparse autoencoders for activation monitoring
Henk Tillman
Apr 30, 2025, 5:09 PM
16
points
0
comments
1
min read
LW
link
(arxiv.org)
Scaling Laws for Scalable Oversight
Subhash Kantamneni
,
Josh Engels
,
David Baek
and
Max Tegmark
Apr 30, 2025, 12:13 PM
19
points
0
comments
9
min read
LW
link
Early Chinese Language Media Coverage of the AI 2027 Report: A Qualitative Analysis
jeanne_
and
eeeee
Apr 30, 2025, 11:06 AM
137
points
4
comments
11
min read
LW
link
[Paper] Automated Feature Labeling with Token-Space Gradient Descent
Wuschel Schulz
Apr 30, 2025, 10:22 AM
4
points
0
comments
4
min read
LW
link
A single principle related to many Alignment subproblems?
Q Home
Apr 30, 2025, 9:49 AM
25
points
2
comments
16
min read
LW
link
Interpreting the METR Time Horizons Post
snewman
Apr 30, 2025, 3:03 AM
65
points
12
comments
10
min read
LW
link
(amistrongeryet.substack.com)
Should we expect the future to be good?
Neil Crawford
Apr 30, 2025, 12:36 AM
15
points
0
comments
14
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel