RSS

nothoughtsheadempty

Karma: 12

Early Ex­per­i­ments in Re­ward Model In­ter­pre­ta­tion Us­ing Sparse Autoencoders

3 Oct 2023 7:45 UTC
17 points
0 comments5 min readLW link

lu­nais­cod­ing’s Shortform

nothoughtsheadempty18 May 2023 21:41 UTC
1 point
1 comment1 min readLW link