Chris_Leong comments on The Plan − 2023 Version

Chris_Leong Dec 30, 2023, 9:17 AM
11 points
0
So you don’t believe that neurons are the right unit of analysis even after we use something like dictionary learning to remove superposition?
- johnswentworth Jan 3, 2024, 6:10 AM
  7 points
  0
  Parent
  I haven’t yet paid enough attention to dictionary learning myself to give a confident answer. But here’s an incomplete answer, and some of the things I’d think about if I were to investigate the topic more:
  - Insofar as dictionary learning is building on e.g. Olah team’s Toy Models of Superposition work, at least some of the work of “discovering the ontology” has been done. So it’s not completely implausible that this method is basically right! I’d still be skeptical both about how well the Toy Models work generalizes, and how well dictionary learning captures the phenomenon Toy Models found (e.g. “seems kinda intuitively related” doesn’t really cut it).
  - Implementation details matter. If there’s degrees of freedom where people made arbitrary-looking design choices, that’s a very bad sign. (Note: link is to a post about ad-hoc mathematical definitions, but the same general considerations apply here.)
  - There’s still the “interpret the features as what?” side of the problem. E.g. insofar as things are based on the Toy Models phenomenon, the “as what” should be sparse features in the data/environment (where the phrase “sparse feature” is interpreted in the same specific way as the Toy Models work, not just some vague intuition about sparsity and features). And then we need to pretty carefully consider which human-intuitive stuff does and does not constitute “sparse features” in the data in that specific sense.
  - wassname Jan 4, 2024, 4:27 AM
    1 point
    0
    Parent
    
    how well the Toy Models work generalizes
    
    It might be hard to scale it to large multilayer models too. The toy model was a single layer model, where the sparse autoencoder was quite big. iirc the latent space was 8 times as big as the residual stream. Imagine trying to interpret GPT4 with a autoencoder that big, and you need to do it over most layers, it’s intractable.
    
    Maybe they can introduce more efficient ways to un-superposition the features, but it doesn’t look trivial.

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer