As a vocal critic of the whole concept of superposition, this post has changed my mind a lot. An actual mathematical definition that doesn’t depend on any fuzzy notions of what is ‘human interpretable’, and a start on actual algorithms for performing general, useful computation on overcomplete bases of variables.
Everything I’ve read on superposition before this was pretty much only outlining how you could store and access lots of variables from a linear space with sparse encoding, which isn’t exactly a revelation. Every direction is a float, so of course the space can store about float precision to the n-th power different states, which you can describe as superposed sparse features if you like. But I didn’t need to use that lens to talk about the compression. I could just talk about good old non-overcomplete linear algebra bases instead. The ≤n basis vectors in that linear algebra description being the compositional summary variables the sparse inputs got compressed into. If basically all we can do with the ‘superposed variables’ is make lookup tables of them, there didn’t seem to me to be much need for the concept at all to reverse engineer neural networks. Just stick with the summary variables, summarising is what intelligence is all about.
If we can do actual, generalcomputation with the sparse variables? Computations with internal structure that we can’t trivially describe just as well using ≤n floats forming the non-overcomplete linear basis of a vector space? Well, that would change things.
As you note, there’s certainly work left to do here on the error propagation and checking for such algorithms in real networks. But even with this being an early proof of concept, I do now tentatively expect that better-performing implementations of this probably exist. And if such algorithms are possible, they sure do sound potentially extremely useful for an LLM’s job.
On my previous superposition-skeptical models, frameworks like the one described in this post are predicted to be basically impossible. Certainly way more cumbersome than this looks. So unless these ideas fall flat when more research is done on the error tolerance, I guess I was wrong. Oops.
Well. Damn.
As a vocal critic of the whole concept of superposition, this post has changed my mind a lot. An actual mathematical definition that doesn’t depend on any fuzzy notions of what is ‘human interpretable’, and a start on actual algorithms for performing general, useful computation on overcomplete bases of variables.
Everything I’ve read on superposition before this was pretty much only outlining how you could store and access lots of variables from a linear space with sparse encoding, which isn’t exactly a revelation. Every direction is a float, so of course the space can store about float precision to the n-th power different states, which you can describe as superposed sparse features if you like. But I didn’t need to use that lens to talk about the compression. I could just talk about good old non-overcomplete linear algebra bases instead. The ≤n basis vectors in that linear algebra description being the compositional summary variables the sparse inputs got compressed into. If basically all we can do with the ‘superposed variables’ is make lookup tables of them, there didn’t seem to me to be much need for the concept at all to reverse engineer neural networks. Just stick with the summary variables, summarising is what intelligence is all about.
If we can do actual, general computation with the sparse variables? Computations with internal structure that we can’t trivially describe just as well using ≤n floats forming the non-overcomplete linear basis of a vector space? Well, that would change things.
As you note, there’s certainly work left to do here on the error propagation and checking for such algorithms in real networks. But even with this being an early proof of concept, I do now tentatively expect that better-performing implementations of this probably exist. And if such algorithms are possible, they sure do sound potentially extremely useful for an LLM’s job.
On my previous superposition-skeptical models, frameworks like the one described in this post are predicted to be basically impossible. Certainly way more cumbersome than this looks. So unless these ideas fall flat when more research is done on the error tolerance, I guess I was wrong. Oops.