Robert_AIZI

Karma: 1,388

SAEs you can See: Applying Sparse Autoencoders to Clustering

Robert_AIZIOct 28, 2024, 2:48 PM

27 points

0 comments10 min readLW link

Comments on Anthropic’s Scaling Monosemanticity

Robert_AIZIJun 3, 2024, 12:15 PM

98 points

8 comments7 min readLW link

Explaining a Math Magic Trick

Robert_AIZIMay 5, 2024, 7:41 PM

99 points

10 comments5 min readLW link

Research Report: Sparse Autoencoders find only 9/180 board state features in OthelloGPT

Robert_AIZIMar 5, 2024, 1:55 PM

61 points

24 comments10 min readLW link

(aizi.substack.com)

Rating my AI Predictions

Robert_AIZIDec 21, 2023, 2:07 PM

22 points

5 comments2 min readLW link

(aizi.substack.com)

Comparing Anthropic’s Dictionary Learning to Ours

Robert_AIZIOct 7, 2023, 11:30 PM

137 points

8 comments4 min readLW link

Sparse Autoencoders Find Highly Interpretable Directions in Language Models

Logan Riggs, Hoagy, Aidan Ewart and Robert_AIZI

Sep 21, 2023, 3:30 PM

159 points

8 comments5 min readLW link

Unsafe AI as Dynamical Systems

Robert_AIZIJul 14, 2023, 3:31 PM

11 points

0 comments3 min readLW link

(aizi.substack.com)

AIs teams will probably be more superintelligent than individual AIs

Robert_AIZIJul 4, 2023, 2:06 PM

3 points

1 comment2 min readLW link

(aizi.substack.com)

[Research Update] Sparse Autoencoder features are bimodal

Robert_AIZIJun 22, 2023, 1:15 PM

24 points

1 comment5 min readLW link

(aizi.substack.com)

Explaining “Taking features out of superposition with sparse autoencoders”

Robert_AIZIJun 16, 2023, 1:59 PM

10 points

0 comments8 min readLW link

(aizi.substack.com)

[Question] Question for Prediction Market people: where is the money supposed to come from?

Robert_AIZIJun 8, 2023, 1:58 PM

25 points

26 comments1 min readLW link

Is behavioral safety “solved” in non-adversarial conditions?

Robert_AIZIMay 25, 2023, 5:56 PM

26 points

8 comments2 min readLW link

(aizi.substack.com)

Research Report: Incorrectness Cascades (Corrected)

Robert_AIZIMay 9, 2023, 9:54 PM

9 points

0 comments9 min readLW link

(aizi.substack.com)

I was Wrong, Simulator Theory is Real

Robert_AIZIApr 26, 2023, 5:45 PM

75 points

7 comments3 min readLW link

(aizi.substack.com)

The Toxoplasma of AGI Doom and Capabilities?

Robert_AIZIApr 24, 2023, 6:11 PM

72 points

12 comments1 min readLW link

Study 1b: This One Weird Trick does NOT cause incorrectness cascades

Robert_AIZIApr 20, 2023, 6:10 PM

5 points

0 comments6 min readLW link

(aizi.substack.com)

Research Report: Incorrectness Cascades

Robert_AIZIApr 14, 2023, 12:49 PM

19 points

0 comments10 min readLW link

(aizi.substack.com)

Pre-registering a study

Robert_AIZIApr 7, 2023, 3:46 PM

10 points

0 comments6 min readLW link

(aizi.substack.com)

Invocations: The Other Capabilities Overhang?

Robert_AIZIApr 4, 2023, 1:38 PM

29 points

4 comments4 min readLW link

(aizi.substack.com)