RSS

Logan Riggs

Karma: 2,291

Was Re­leas­ing Claude-3 Net-Nega­tive?

Logan Riggs27 Mar 2024 17:41 UTC
42 points
5 comments4 min readLW link

Im­prov­ing SAE’s by Sqrt()-ing L1 & Re­mov­ing Low­est Ac­ti­vat­ing Fea­tures

15 Mar 2024 16:30 UTC
17 points
5 comments4 min readLW link

Find­ing Sparse Lin­ear Con­nec­tions be­tween Fea­tures in LLMs

9 Dec 2023 2:27 UTC
68 points
5 comments10 min readLW link

Sparse Au­toen­coders: Fu­ture Work

21 Sep 2023 15:30 UTC
34 points
5 comments6 min readLW link

Sparse Au­toen­coders Find Highly In­ter­pretable Direc­tions in Lan­guage Models

21 Sep 2023 15:30 UTC
156 points
7 comments5 min readLW link

Really Strong Fea­tures Found in Resi­d­ual Stream

Logan Riggs8 Jul 2023 19:40 UTC
69 points
6 comments2 min readLW link

(ten­ta­tively) Found 600+ Monose­man­tic Fea­tures in a Small LM Us­ing Sparse Autoencoders

Logan Riggs5 Jul 2023 16:49 UTC
58 points
1 comment7 min readLW link

[Repli­ca­tion] Con­jec­ture’s Sparse Cod­ing in Small Transformers

16 Jun 2023 18:02 UTC
52 points
0 comments5 min readLW link

[Repli­ca­tion] Con­jec­ture’s Sparse Cod­ing in Toy Models

2 Jun 2023 17:34 UTC
23 points
0 comments1 min readLW link

[Si­mu­la­tors sem­i­nar se­quence] #2 Semiotic physics—revamped

27 Feb 2023 0:25 UTC
23 points
23 comments13 min readLW link

Mak­ing Im­plied Stan­dards Explicit

Logan Riggs25 Feb 2023 20:02 UTC
20 points
0 comments4 min readLW link

Pro­posal for In­duc­ing Steganog­ra­phy in LMs

Logan Riggs12 Jan 2023 22:15 UTC
22 points
2 comments2 min readLW link

[Si­mu­la­tors sem­i­nar se­quence] #1 Back­ground & shared assumptions

2 Jan 2023 23:48 UTC
49 points
4 comments3 min readLW link

Re­sults from a sur­vey on tool use and work­flows in al­ign­ment research

19 Dec 2022 15:19 UTC
79 points
2 comments19 min readLW link

A de­scrip­tive, not pre­scrip­tive, overview of cur­rent AI Align­ment Research

6 Jun 2022 21:59 UTC
138 points
21 comments7 min readLW link

Frame for Take-Off Speeds to in­form com­pute gov­er­nance & scal­ing alignment

Logan Riggs13 May 2022 22:23 UTC
15 points
2 comments2 min readLW link

Align­ment as Constraints

Logan Riggs13 May 2022 22:07 UTC
10 points
0 comments2 min readLW link

Make a Movie Show­ing Align­ment Failures

Logan Riggs13 Apr 2022 21:54 UTC
75 points
11 comments2 min readLW link

Con­vinc­ing Peo­ple of Align­ment with Street Epistemology

Logan Riggs12 Apr 2022 23:43 UTC
54 points
4 comments3 min readLW link

Roam Re­search Mo­bile is Out!

Logan Riggs8 Apr 2022 19:05 UTC
12 points
0 comments1 min readLW link