RSS

ojorgensen

Karma: 195

AI Safety Researcher, my website is here.

Un­der­stand­ing Coun­ter­bal­anced Sub­trac­tions for Bet­ter Ac­ti­va­tion Additions

ojorgensenAug 17, 2023, 1:53 PM
21 points
0 comments14 min readLW link

Be­cause of Lay­erNorm, Direc­tions in GPT-2 MLP Lay­ers are Monosemantic

ojorgensenJul 28, 2023, 7:43 PM
13 points
3 comments13 min readLW link

UK Foun­da­tion Model Task Force—Ex­pres­sion of Interest

ojorgensenJun 18, 2023, 9:43 AM
64 points
2 comments1 min readLW link
(twitter.com)

ojor­gensen’s Shortform

ojorgensenMay 4, 2023, 1:51 PM
2 points
1 comment1 min readLW link

(Ex­tremely) Naive Gra­di­ent Hack­ing Doesn’t Work

ojorgensenDec 20, 2022, 2:35 PM
17 points
0 comments6 min readLW link