LLMs are likely not conscious

I think the sparse autoencoder line of interpretability work is somewhat convincing evidence that LLMs are not conscious.

In order for me to consciously take in some information (e.g. the house is yellow), I need to store not only the contents of the statement but also some aspect of my conscious experience. I need to store more than the minimal number of bits it would take to represent “the house is yellow”.

The sparse autoencoder line of work appears to suggest that LLMs essentially store “bits” that represent “themes” in the text they’re processing, but close to nothing (at least in L2 norm) beyond that. And furthermore, this is happening in each layer. Thus, there doesn’t appear to be any residual “space” that left over for storing aspects of consciousness.