Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Daniel Lee
Karma:
78
All
Posts
Comments
New
Top
Old
Finding Features Causally Upstream of Refusal
Daniel Lee
,
Eric Breck
and
Andy Arditi
14 Jan 2025 2:30 UTC
54
points
5
comments
12
min read
LW
link
Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs
Daniel Lee
and
StefanHex
6 Sep 2024 2:28 UTC
28
points
0
comments
12
min read
LW
link
Back to top