WCargo comments on Fact Finding: Attempting to Reverse-Engineer Factual Recall on the Neuron Level (Post 1)

WCargo 9 Feb 2024 13:40 UTC
LW: 1 AF: 1
0
AF
Quick question: you say that the MLP 2-6 gradually improve the representation of the sport of the athlete, and that no single MLP do it in one go. Would you consider that the reason would be something like this post describes ? https://www.lesswrong.com/posts/8ms977XZ2uJ4LnwSR/decomposing-independent-generalizations-in-neural-networks

So the MLP 2-6 basically do the same computations, but in a different superposition basis so that after several MLPs, the model is pretty confident about the answer ? Then would you think there is something more to say in the way the “basis are arranged”, eg which concept interfere with which (i guess this could help answering questions like “how to change the lookup table name-surname-sport” which we are currently not able to do)

thks
- Neel Nanda 9 Feb 2024 23:08 UTC
  LW: 3 AF: 3
  0
  AF Parent
  We dig into this in post 3. The layers compose importantly with each other and don’t seem to be doing the same thing in parallel, path patching the internal connections will break things, so I don’t think it’s like what you’re describing