RSS

Can

Karma: 257

SAEBench: A Com­pre­hen­sive Bench­mark for Sparse Autoencoders

Dec 11, 2024, 6:30 AM
82 points
6 comments2 min readLW link
(www.neuronpedia.org)

Eval­u­at­ing Sparse Au­toen­coders with Board Game Models

2 Aug 2024 19:50 UTC
38 points
1 comment9 min readLW link

Othel­loGPT learned a bag of heuristics

2 Jul 2024 9:12 UTC
111 points
10 comments9 min readLW link

Past Tense Features

Can20 Apr 2024 14:34 UTC
12 points
0 comments4 min readLW link

An ad­ver­sar­ial ex­am­ple for Direct Logit At­tri­bu­tion: mem­ory man­age­ment in gelu-4l

30 Aug 2023 17:36 UTC
17 points
0 comments8 min readLW link
(arxiv.org)

Un­der­stand­ing mesa-op­ti­miza­tion us­ing toy models

7 May 2023 17:00 UTC
43 points
2 comments10 min readLW link

Safety of Self-Assem­bled Neu­ro­mor­phic Hardware

Can26 Dec 2022 18:51 UTC
16 points
2 comments10 min readLW link
(forum.effectivealtruism.org)