RSS

Hoagy

Karma: 1,068

Au­dit­ing lan­guage mod­els for hid­den objectives

Mar 13, 2025, 7:18 PM
138 points
15 comments13 min readLW link

Some ad­di­tional SAE thoughts

HoagyJan 13, 2024, 7:31 PM
31 points
4 comments13 min readLW link

Sparse Au­toen­coders Find Highly In­ter­pretable Direc­tions in Lan­guage Models

Sep 21, 2023, 3:30 PM
159 points
8 comments5 min readLW link

Au­toIn­ter­pre­ta­tion Finds Sparse Cod­ing Beats Alternatives

HoagyJul 17, 2023, 1:41 AM
57 points
1 comment7 min readLW link

[Repli­ca­tion] Con­jec­ture’s Sparse Cod­ing in Small Transformers

Jun 16, 2023, 6:02 PM
52 points
0 comments5 min readLW link

[Repli­ca­tion] Con­jec­ture’s Sparse Cod­ing in Toy Models

Jun 2, 2023, 5:34 PM
24 points
0 comments1 min readLW link

Univer­sal­ity and Hid­den In­for­ma­tion in Con­cept Bot­tle­neck Models

HoagyApr 5, 2023, 2:00 PM
23 points
0 comments11 min readLW link

No­kens: A po­ten­tial method of in­ves­ti­gat­ing glitch tokens

HoagyMar 15, 2023, 4:23 PM
21 points
0 comments4 min readLW link

Au­tomat­ing Consistency

HoagyFeb 17, 2023, 1:24 PM
10 points
0 comments1 min readLW link

Distil­led Rep­re­sen­ta­tions Re­search Agenda

Oct 18, 2022, 8:59 PM
15 points
2 comments8 min readLW link

Re­mak­ing Effi­cien­tZero (as best I can)

HoagyJul 4, 2022, 11:03 AM
36 points
9 comments22 min readLW link

Note-Tak­ing with­out Hid­den Messages

HoagyApr 30, 2022, 11:15 AM
17 points
2 comments4 min readLW link

ELK Sub—Note-tak­ing in in­ter­nal rollouts

HoagyMar 9, 2022, 5:23 PM
6 points
0 comments5 min readLW link

Au­to­mated Fact Check­ing: A Look at the Field

HoagyOct 6, 2021, 11:52 PM
12 points
0 comments8 min readLW link

Hoagy’s Shortform

HoagySep 21, 2020, 10:00 PM
3 points
12 commentsLW link

Safe Scram­bling?

HoagyAug 29, 2020, 2:31 PM
3 points
1 comment2 min readLW link

When do util­ity func­tions con­strain?

HoagyAug 23, 2019, 5:19 PM
30 points
8 comments7 min readLW link