Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
New
Hot
Active
Old
Page
1
Refusal in LLMs is mediated by a single direction
Andy Arditi
,
Oscar Obeso
,
Aaquib111
,
wesg
and
Neel Nanda
27 Apr 2024 11:13 UTC
191
points
82
comments
10
min read
LW
link
Open Thread Spring 2024
habryka
11 Mar 2024 19:17 UTC
22
points
144
comments
1
min read
LW
link
Language Models Model Us
eggsyntax
17 May 2024 21:00 UTC
134
points
43
comments
7
min read
LW
link
[Question]
Are most people deeply confused about “love”, or am I missing a human universal?
SpectrumDT
23 May 2024 13:22 UTC
2
points
3
comments
3
min read
LW
link
The Button (Short Comic)
milanrosko
22 May 2024 23:28 UTC
2
points
1
comment
1
min read
LW
link
Let’s make the truth easier to find
DPiepgrass
20 Mar 2023 4:28 UTC
24
points
44
comments
1
min read
LW
link
Why entropy means you might not have to worry as much about superintelligent AI
Ron J
23 May 2024 3:52 UTC
−13
points
1
comment
2
min read
LW
link
A Bi-Modal Brain Model
Johannes C. Mayer
22 May 2024 20:10 UTC
12
points
2
comments
2
min read
LW
link
What will the first human-level AI look like, and how might things go wrong?
EuanMcLean
23 May 2024 11:17 UTC
5
points
1
comment
15
min read
LW
link
“Reframing Superintelligence” + LLMs + 4 years
Eric Drexler
10 Jul 2023 13:42 UTC
117
points
9
comments
12
min read
LW
link
How to train your own “Sleeper Agents”
evhub
7 Feb 2024 0:31 UTC
91
points
11
comments
2
min read
LW
link
[Question]
SAE sparse feature graph using only residual layers
crayhippo
23 May 2024 13:32 UTC
0
points
0
comments
1
min read
LW
link
Power Law Policy
Ben Turtel
23 May 2024 5:28 UTC
9
points
2
comments
6
min read
LW
link
(bturtel.substack.com)
[Question]
Which skincare products are evidence-based?
Vanessa Kosoy
2 May 2024 15:22 UTC
108
points
44
comments
1
min read
LW
link
Each Llama3-8b text uses a different “random” subspace of the activation space
tailcalled
22 May 2024 7:31 UTC
3
points
4
comments
7
min read
LW
link
What’s Going on With OpenAI’s Messaging?
ozziegooen
21 May 2024 2:22 UTC
158
points
12
comments
1
min read
LW
link
Executive Dysfunction 101
DaystarEld
23 May 2024 12:43 UTC
7
points
0
comments
3
min read
LW
link
(daystareld.com)
AI #65: I Spy With My AI
Zvi
23 May 2024 12:40 UTC
12
points
0
comments
43
min read
LW
link
(thezvi.wordpress.com)
The predictive power of dissipative adaptation
dr_s
17 Dec 2023 14:01 UTC
45
points
13
comments
19
min read
LW
link
What mistakes has the AI safety movement made?
EuanMcLean
23 May 2024 11:19 UTC
18
points
0
comments
12
min read
LW
link
Back to top
Next