Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Robert_AIZI
Karma:
1,385
All
Posts
Comments
New
Top
Old
Page
1
SAEs you can See: Applying Sparse Autoencoders to Clustering
Robert_AIZI
28 Oct 2024 14:48 UTC
27
points
0
comments
10
min read
LW
link
Comments on Anthropic’s Scaling Monosemanticity
Robert_AIZI
3 Jun 2024 12:15 UTC
97
points
8
comments
7
min read
LW
link
Explaining a Math Magic Trick
Robert_AIZI
5 May 2024 19:41 UTC
97
points
10
comments
5
min read
LW
link
Research Report: Sparse Autoencoders find only 9/180 board state features in OthelloGPT
Robert_AIZI
5 Mar 2024 13:55 UTC
61
points
24
comments
10
min read
LW
link
(aizi.substack.com)
Rating my AI Predictions
Robert_AIZI
21 Dec 2023 14:07 UTC
22
points
5
comments
2
min read
LW
link
(aizi.substack.com)
Comparing Anthropic’s Dictionary Learning to Ours
Robert_AIZI
7 Oct 2023 23:30 UTC
137
points
8
comments
4
min read
LW
link
Sparse Autoencoders Find Highly Interpretable Directions in Language Models
Logan Riggs
,
Hoagy
,
Aidan Ewart
and
Robert_AIZI
21 Sep 2023 15:30 UTC
159
points
8
comments
5
min read
LW
link
Unsafe AI as Dynamical Systems
Robert_AIZI
14 Jul 2023 15:31 UTC
11
points
0
comments
3
min read
LW
link
(aizi.substack.com)
AIs teams will probably be more superintelligent than individual AIs
Robert_AIZI
4 Jul 2023 14:06 UTC
3
points
1
comment
2
min read
LW
link
(aizi.substack.com)
[Research Update] Sparse Autoencoder features are bimodal
Robert_AIZI
22 Jun 2023 13:15 UTC
24
points
1
comment
5
min read
LW
link
(aizi.substack.com)
Explaining “Taking features out of superposition with sparse autoencoders”
Robert_AIZI
16 Jun 2023 13:59 UTC
10
points
0
comments
8
min read
LW
link
(aizi.substack.com)
[Question]
Question for Prediction Market people: where is the money supposed to come from?
Robert_AIZI
8 Jun 2023 13:58 UTC
25
points
26
comments
1
min read
LW
link
Is behavioral safety “solved” in non-adversarial conditions?
Robert_AIZI
25 May 2023 17:56 UTC
26
points
8
comments
2
min read
LW
link
(aizi.substack.com)
Research Report: Incorrectness Cascades (Corrected)
Robert_AIZI
9 May 2023 21:54 UTC
9
points
0
comments
9
min read
LW
link
(aizi.substack.com)
I was Wrong, Simulator Theory is Real
Robert_AIZI
26 Apr 2023 17:45 UTC
75
points
7
comments
3
min read
LW
link
(aizi.substack.com)
The Toxoplasma of AGI Doom and Capabilities?
Robert_AIZI
24 Apr 2023 18:11 UTC
72
points
12
comments
1
min read
LW
link
Study 1b: This One Weird Trick does NOT cause incorrectness cascades
Robert_AIZI
20 Apr 2023 18:10 UTC
5
points
0
comments
6
min read
LW
link
(aizi.substack.com)
Research Report: Incorrectness Cascades
Robert_AIZI
14 Apr 2023 12:49 UTC
19
points
0
comments
10
min read
LW
link
(aizi.substack.com)
Pre-registering a study
Robert_AIZI
7 Apr 2023 15:46 UTC
10
points
0
comments
6
min read
LW
link
(aizi.substack.com)
Invocations: The Other Capabilities Overhang?
Robert_AIZI
4 Apr 2023 13:38 UTC
29
points
4
comments
4
min read
LW
link
(aizi.substack.com)
Back to top
Next