RSS

mwatkins

Karma: 1,704

Ex­plor­ing the pe­ter­todd /​ Leilan du­al­ity in GPT-2 and GPT-J

mwatkinsDec 23, 2024, 1:17 PM
12 points
1 comment17 min readLW link

Ex­plor­ing SAE fea­tures in LLMs with defi­ni­tion trees and to­ken lists

mwatkinsOct 4, 2024, 10:15 PM
37 points
5 comments6 min readLW link

Nav­i­gat­ing LLM em­bed­ding spaces us­ing archetype-based directions

mwatkinsMay 8, 2024, 5:54 AM
15 points
4 comments28 min readLW link

What’s up with all the non-Mor­mons? Weirdly spe­cific uni­ver­sal­ities across LLMs

mwatkinsApr 19, 2024, 1:43 PM
40 points
13 comments27 min readLW link

Phal­lo­cen­tric­ity in GPT-J’s bizarre strat­ified ontology

mwatkinsFeb 17, 2024, 12:16 AM
50 points
37 comments9 min readLW link

Map­ping the se­man­tic void III: Ex­plor­ing neighbourhoods

mwatkinsFeb 15, 2024, 11:01 PM
13 points
0 comments10 min readLW link

Map­ping the se­man­tic void II: Above, be­low and be­tween to­ken em­bed­dings

mwatkinsFeb 15, 2024, 11:00 PM
31 points
4 comments10 min readLW link

′ pe­ter­todd’’s last stand: The fi­nal days of open GPT-3 research

mwatkinsJan 22, 2024, 6:47 PM
109 points
16 comments45 min readLW link

Map­ping the se­man­tic void: Strange go­ings-on in GPT em­bed­ding spaces

mwatkinsDec 14, 2023, 1:10 PM
114 points
31 comments14 min readLW link

Lin­ear en­cod­ing of char­ac­ter-level in­for­ma­tion in GPT-J to­ken embeddings

Nov 10, 2023, 10:19 PM
34 points
4 comments28 min readLW link

The “spel­ling mir­a­cle”: GPT-3 spel­ling abil­ities and glitch to­kens revisited

mwatkinsJul 31, 2023, 7:47 PM
85 points
29 comments20 min readLW link

The ‘ pe­ter­todd’ phenomenon

mwatkinsApr 15, 2023, 12:59 AM
192 points
50 comments38 min readLW link1 review

AI Safety Info Distil­la­tion Fellowship

Feb 17, 2023, 4:16 PM
47 points
3 comments3 min readLW link

SolidGoldMag­ikarp III: Glitch to­ken archaeology

Feb 14, 2023, 10:17 AM
91 points
35 comments16 min readLW link

SolidGoldMag­ikarp II: tech­ni­cal de­tails and more re­cent findings

Feb 6, 2023, 7:09 PM
113 points
45 comments13 min readLW link

SolidGoldMag­ikarp (plus, prompt gen­er­a­tion)

Feb 5, 2023, 10:02 PM
682 points
206 comments12 min readLW link1 review

All AGI Safety ques­tions wel­come (es­pe­cially ba­sic ones) [~monthly thread]

Jan 26, 2023, 9:01 PM
39 points
81 comments2 min readLW link