RSS

Can

Karma: 250

SAEBench: A Com­pre­hen­sive Bench­mark for Sparse Autoencoders

Dec 11, 2024, 6:30 AM
78 points
2 comments2 min readLW link
(www.neuronpedia.org)

Eval­u­at­ing Sparse Au­toen­coders with Board Game Models

Aug 2, 2024, 7:50 PM
38 points
1 comment9 min readLW link

Othel­loGPT learned a bag of heuristics

Jul 2, 2024, 9:12 AM
108 points
10 comments9 min readLW link

Past Tense Features

CanApr 20, 2024, 2:34 PM
12 points
0 comments4 min readLW link

An ad­ver­sar­ial ex­am­ple for Direct Logit At­tri­bu­tion: mem­ory man­age­ment in gelu-4l

Aug 30, 2023, 5:36 PM
17 points
0 comments8 min readLW link
(arxiv.org)

Un­der­stand­ing mesa-op­ti­miza­tion us­ing toy models

May 7, 2023, 5:00 PM
43 points
2 comments10 min readLW link

Safety of Self-Assem­bled Neu­ro­mor­phic Hardware

CanDec 26, 2022, 6:51 PM
16 points
2 comments10 min readLW link
(forum.effectivealtruism.org)