Maybe some form of memory is encoded as key-value pairs in the MLP of the transformers?
Or maybe you could think of NNs as Bayesian causal graphs.
Or maybe you should think instead of algorithms inside transformers (Induction heads, modular addition algorithm, etc...) and it’s not that meaningful to think of linear direction.
I think you could imagine many different types of elementary units wrapped in different ontologies:
Information may be encoded linearly in NN, with superposition or composition, locally or highly distributed. (see the figure below from Distributed Representations: Composition & Superposition)
Maybe a good way to understand NN is the polytope theory?
Maybe some form of memory is encoded as key-value pairs in the MLP of the transformers?
Or maybe you could think of NNs as Bayesian causal graphs.
Or maybe you should think instead of algorithms inside transformers (Induction heads, modular addition algorithm, etc...) and it’s not that meaningful to think of linear direction.
Or most likely a mixture of everything.
Thanks, that’s the kind of answer I was looking for