
Karma: 1,357

Com­ments on An­thropic’s Scal­ing Monosemanticity

Robert_AIZI3 Jun 2024 12:15 UTC
97 points
8 comments7 min readLW link

Ex­plain­ing a Math Magic Trick

Robert_AIZI5 May 2024 19:41 UTC
97 points
10 comments5 min readLW link

Re­search Re­port: Sparse Au­toen­coders find only 9/​180 board state fea­tures in OthelloGPT

Robert_AIZI5 Mar 2024 13:55 UTC
61 points
24 comments10 min readLW link

Rat­ing my AI Predictions

Robert_AIZI21 Dec 2023 14:07 UTC
22 points
5 comments2 min readLW link

Com­par­ing An­thropic’s Dic­tionary Learn­ing to Ours

Robert_AIZI7 Oct 2023 23:30 UTC
137 points
8 comments4 min readLW link

Sparse Au­toen­coders Find Highly In­ter­pretable Direc­tions in Lan­guage Models

21 Sep 2023 15:30 UTC
157 points
8 comments5 min readLW link

Un­safe AI as Dy­nam­i­cal Systems

Robert_AIZI14 Jul 2023 15:31 UTC
11 points
0 comments3 min readLW link

AIs teams will prob­a­bly be more su­per­in­tel­li­gent than in­di­vi­d­ual AIs

Robert_AIZI4 Jul 2023 14:06 UTC
3 points
1 comment2 min readLW link

[Re­search Up­date] Sparse Au­toen­coder fea­tures are bimodal

Robert_AIZI22 Jun 2023 13:15 UTC
24 points
1 comment5 min readLW link

Ex­plain­ing “Tak­ing fea­tures out of su­per­po­si­tion with sparse au­toen­coders”

Robert_AIZI16 Jun 2023 13:59 UTC
10 points
0 comments8 min readLW link

[Question] Ques­tion for Pre­dic­tion Mar­ket peo­ple: where is the money sup­posed to come from?

Robert_AIZI8 Jun 2023 13:58 UTC
25 points
26 comments1 min readLW link

Is be­hav­ioral safety “solved” in non-ad­ver­sar­ial con­di­tions?

Robert_AIZI25 May 2023 17:56 UTC
26 points
8 comments2 min readLW link

Re­search Re­port: In­cor­rect­ness Cas­cades (Cor­rected)

Robert_AIZI9 May 2023 21:54 UTC
9 points
0 comments9 min readLW link

I was Wrong, Si­mu­la­tor The­ory is Real

Robert_AIZI26 Apr 2023 17:45 UTC
75 points
7 comments3 min readLW link

The Tox­o­plasma of AGI Doom and Ca­pa­bil­ities?

Robert_AIZI24 Apr 2023 18:11 UTC
72 points
12 comments1 min readLW link

Study 1b: This One Weird Trick does NOT cause in­cor­rect­ness cascades

Robert_AIZI20 Apr 2023 18:10 UTC
5 points
0 comments6 min readLW link

Re­search Re­port: In­cor­rect­ness Cascades

Robert_AIZI14 Apr 2023 12:49 UTC
19 points
0 comments10 min readLW link

Pre-reg­is­ter­ing a study

Robert_AIZI7 Apr 2023 15:46 UTC
10 points
0 comments6 min readLW link

In­vo­ca­tions: The Other Ca­pa­bil­ities Over­hang?

Robert_AIZI4 Apr 2023 13:38 UTC
29 points
4 comments4 min readLW link

Early Re­sults: Do LLMs com­plete false equa­tions with false equa­tions?

Robert_AIZI30 Mar 2023 20:14 UTC
14 points
0 comments4 min readLW link