Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Logan Riggs
Karma:
3,021
All
Posts
Comments
New
Top
Old
Page
1
Veo-2 Can Produce Realistic Ads
Logan Riggs
Jan 21, 2025, 7:13 PM
14
points
0
comments
1
min read
LW
link
[Exercise] Four Examples of Noticing Confusion
Logan Riggs
Jan 18, 2025, 3:29 PM
8
points
8
comments
3
min read
LW
link
How do you deal w/ Super Stimuli?
Logan Riggs
Jan 14, 2025, 3:14 PM
100
points
25
comments
3
min read
LW
link
When AI 10x’s AI R&D, What Do We Do?
Logan Riggs
Dec 21, 2024, 11:56 PM
72
points
16
comments
4
min read
LW
link
Logan Riggs’s Shortform
Logan Riggs
Dec 4, 2024, 2:52 PM
7
points
13
comments
1
min read
LW
link
Book a Time to Chat about Interp Research
Logan Riggs
Dec 3, 2024, 5:27 PM
47
points
3
comments
1
min read
LW
link
Evaluating Sparse Autoencoders with Board Game Models
Adam Karvonen
,
Sam Marks
,
Can
,
Benjamin Wright
,
Jannik Brinkmann
,
Logan Riggs
and
Rico Angell
Aug 2, 2024, 7:50 PM
38
points
1
comment
9
min read
LW
link
Interpreting Preference Models w/ Sparse Autoencoders
Logan Riggs
and
Jannik Brinkmann
Jul 1, 2024, 9:35 PM
74
points
12
comments
9
min read
LW
link
Was Releasing Claude-3 Net-Negative?
Logan Riggs
Mar 27, 2024, 5:41 PM
52
points
5
comments
4
min read
LW
link
Improving SAE’s by Sqrt()-ing L1 & Removing Lowest Activating Features
Logan Riggs
and
Jannik Brinkmann
Mar 15, 2024, 4:30 PM
26
points
5
comments
4
min read
LW
link
Finding Sparse Linear Connections between Features in LLMs
Logan Riggs
,
Sam Mitchell
and
Adam Kaufman
Dec 9, 2023, 2:27 AM
70
points
5
comments
10
min read
LW
link
Sparse Autoencoders: Future Work
Logan Riggs
and
Aidan Ewart
Sep 21, 2023, 3:30 PM
35
points
5
comments
6
min read
LW
link
Sparse Autoencoders Find Highly Interpretable Directions in Language Models
Logan Riggs
,
Hoagy
,
Aidan Ewart
and
Robert_AIZI
Sep 21, 2023, 3:30 PM
159
points
8
comments
5
min read
LW
link
Really Strong Features Found in Residual Stream
Logan Riggs
Jul 8, 2023, 7:40 PM
69
points
6
comments
2
min read
LW
link
(tentatively) Found 600+ Monosemantic Features in a Small LM Using Sparse Autoencoders
Logan Riggs
Jul 5, 2023, 4:49 PM
60
points
1
comment
7
min read
LW
link
[Replication] Conjecture’s Sparse Coding in Small Transformers
Hoagy
and
Logan Riggs
Jun 16, 2023, 6:02 PM
52
points
0
comments
5
min read
LW
link
[Replication] Conjecture’s Sparse Coding in Toy Models
Hoagy
and
Logan Riggs
Jun 2, 2023, 5:34 PM
24
points
0
comments
1
min read
LW
link
[Simulators seminar sequence] #2 Semiotic physics—revamped
Jan
,
Charlie Steiner
,
Logan Riggs
,
janus
,
jacquesthibs
,
metasemi
,
Michael Oesterle
,
Lucas Teixeira
,
peligrietzer
and
remember
Feb 27, 2023, 12:25 AM
24
points
23
comments
13
min read
LW
link
Making Implied Standards Explicit
Logan Riggs
Feb 25, 2023, 8:02 PM
22
points
0
comments
4
min read
LW
link
Proposal for Inducing Steganography in LMs
Logan Riggs
Jan 12, 2023, 10:15 PM
22
points
3
comments
2
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel