PhD student at UCL. Interested in mech interp.
Daniel Tan(Daniel Tan)
Karma: 0
This work is very exciting to me, and I’m curious to hear the authors’ thoughts on whether we could verify specific predictions made by this model in real models.
For example, the proposed U-AND operator—do we expect this to occur in real LLMs, and could we try to find evidence of this by applying mech interp to carefully-chosen toy models?
I have a more detailed write-up on model organisms of superposition here: https://docs.google.com/document/d/1hwI30HNNB2MkOrtEzo7hppG9X7Cn7Xm9a-1LBqcttWc/edit?usp=sharing
Would love to discuss this more!
Hey Jacob + Philippe,
I took the liberty of making a clean installable version of your original codebase. Hope you don’t mind, and happy to make any changes that you request! https://github.com/dtch1997/transcoders-slim