This work is very exciting to me, and I’m curious to hear the authors’ thoughts on whether we could verify specific predictions made by this model in real models.
For example, the proposed U-AND operator—do we expect this to occur in real LLMs, and could we try to find evidence of this by applying mech interp to carefully-chosen toy models?
This work is very exciting to me, and I’m curious to hear the authors’ thoughts on whether we could verify specific predictions made by this model in real models.
For example, the proposed U-AND operator—do we expect this to occur in real LLMs, and could we try to find evidence of this by applying mech interp to carefully-chosen toy models?
I have a more detailed write-up on model organisms of superposition here: https://docs.google.com/document/d/1hwI30HNNB2MkOrtEzo7hppG9X7Cn7Xm9a-1LBqcttWc/edit?usp=sharing
Would love to discuss this more!