You May Want to Know About Locally Decodable Codes
In AI alignment and interpretability research, there’s a compelling intuition that understanding equals compression. The idea is straightforward: if you truly understand a system, you can describe it more concisely by leveraging that understanding. This philosophy suggests that better interpretability techniques for neural networks should yield better compression of their behavior or parameters.
jake_mendel asks: if understanding equals compression, then shouldn’t ZIP compression of neural network weights count as understanding? After all, ZIP achieves remarkable compression ratios on neural network weights—likely better than any current interpretability technique. Yet intuitively, having a ZIP file of weights doesn’t feel like understanding at all! We wouldn’t say we’ve interpreted a neural network just because we’ve compressed its weights into a ZIP file.
Compressing a bit string means finding a code for that string, and the study of such codes is the central topic of both algorithmic and Shannon information theory. Just compressing the set of weights as small as possible is too naive—we probably want to impose additional properties on the codes.
One crucial property we might want is “local decodability”: if you ask a question about any specific part of the original neural network, you should be able to answer it by examining only a small portion of the compressed representation. You shouldn’t need to decompress the entire thing just to understand one small aspect of how the network operates. This matches our intuitions about human understanding—when we truly understand something, we can answer specific questions about it without having to review everything we know.
A Locally Decodable Code (LDC)is a special type of error-correcting code that allows recovery of any single bit of the original message by querying only a small number of bits of the encoded message, even in the presence of some corruption. This property stands in stark contrast to ZIP compression, which requires processing the entire compressed file sequentially to recover any specific piece of information. ZIP compression is not locally decodable.
There’s a fundamental tension between how compact an LDC can be (its rate) and how many bits you need to query to decode a single piece of information (query complexity). You can’t make an LDC that only needs to look at one position, and if you restrict yourself to two queries, your code length must grow exponentially with message size.
This technical tradeoff might reflect something deeper about the nature of understanding. Perhaps true understanding requires both compression (representing information concisely) and accessibility (being able to quickly retrieve specific pieces of information), and there are fundamental limits to achieving both simultaneously.
I don’t remember the details, but IIRC ZIP is mostly based on Lempel-Ziv, and it’s fairly straightforward to modify Lempel-Ziv to allow for efficient local decoding.
My guess would be that the large majority of the compression achieved by ZIP on NN weights is because the NN weights are mostly-roughly-standard-normal, and IEEE floats are not very efficient for standard normal variables. So ZIP achieves high compression for “kinda boring reasons”, in the sense that we already knew all about that compressibillity but just don’t leverage it in day-to-day operations because our float arithmetic hardware uses IEEE.
John, you know much coding theory much better than I do so I am inclinced to defer to your superior knowledge...
Now behold the awesome power of gpt-Pro
Let’s unpack the question in pieces:
1. Is ZIP (a.k.a. DEFLATE) “locally decodable” or not?
Standard ZIP files are typically not “locally decodable” in the strictest sense—i.e., you cannot start decoding exactly at the byte corresponding to your region of interest and reconstruct just that portion without doing some earlier decoding.
The underlying method, DEFLATE, is indeed based on LZ77 plus Huffman coding. LZ77 uses back‐references to prior segments of the uncompressed data, which forces you to at least partially decode from an earlier point to resolve those references.
However, many ZIP files are stored in blocks or chunks—sometimes called “deflate blocks.” So if you only need data near the start of a block (rather than in the middle of one), you can skip previous blocks and just decode that chunk. Still, you cannot trivially jump to an arbitrary byte offset inside the block without re-initializing the LZ77 state from somewhere earlier.
2. If ZIP is “based on Lempel–Ziv,” can we easily modify it for local decodability?
Because Lempel–Ziv (LZ) methods revolve around referencing earlier uncompressed substrings, one must store enough information (or replicate the references with separate “partial dictionaries”) if one wants truly local decodability.
One can certainly modify an LZ-based compression scheme so that each block is smaller, or that references never cross block boundaries, etc. That would allow a more local decoding property: you only decode from the start of the local block rather than from the entire file.
This usually comes with tradeoffs. If your blocks are small, you lose compression efficiency because you can’t take advantage of references bridging across large swaths of data. So making a compression format “locally decodable” in a fine‐grained way often lowers the overall compression ratio.
3. Is Lempel–Ziv (and thus ZIP) lossless?
Yes. Standard LZ77 (or LZ78, LZMA, etc.) is a lossless compression scheme: you can reconstruct the exact original data bit for bit. The DEFLATE algorithm used in ZIP is also lossless.
4. Does the picture change when doing lossy compression?
Yes, quite a bit. In lossy compression (like JPEG for images or MP3/AAC for audio), parts of the original data are thrown away. Often you can still decode small “chunks” of the file—e.g., many lossy audio or video formats do decode in frames or segments. But the format internals (e.g., DCT coefficients in JPEG, wavelet transforms, etc.) differ a lot from LZ-based bit-by-bit exactness.
In lossy formats that use transform blocks (like 8×8 pixel blocks in JPEG), you can decode each block locally, but you still have boundary effects, optional overlap between blocks, etc. So “local decodability” there can be simpler in some ways (each block is more independent), but it’s also less “exact” because we accept errors from the original data.
Short Summary
ZIP/DEFLATE is not strictly locally decodable because LZ77 references require that you decode from (at least) the beginning of the current block.
You can modify an LZ-based scheme (for instance, by using smaller blocks or restricting cross‐block references) to make it more locally decodable, but that typically sacrifices some compression ratio.
LZ methods and ZIP are lossless.
With lossy compression, “local decodability” can be easier in the sense that each block may be decoded without referencing data too far outside itself—but at the cost that you’re not reconstructing the original data exactly.
Using ZIP as compression metric for NNs (I assume you do something along the lines of “take all the weights and line them up and then ZIP”) is unintuitive to me for the following reason: ZIP, though really this should apply to any other coding scheme that just tries to compress the weights by themselves, picks up on statistical patterns in the raw weights. But NNs are not just simply a list of floats, they are arranged in highly structured manner. The weights themselves get turned into functions and it is 1.the functions, and 2. the way the functions interact that we are ultimately trying to understand (and therefore compress).
To wit, a simple example for the first point : Assume that inside your model is a 2x2 matrix with entries M=[0.587785, −0.809017, 0.809017, 0.587785]. Storing it like this will cost you a few bytes and if you compress it you can ~ half the cost I believe. But really there is a much more compact way to store this information: This matrix represents a rotation by 36 degrees. Storing it this way, requires less than 1 byte.
This phenomenon should get worse for bigger models. One reason is the following: If we believe that the NN uses superposition, then there will be an overbasis in which all the computations are done (more) sparsly. And if we don’t factor that in, then ZIP will not include such information (Caveat: This is my intuition, I don’t have empirical results to back this up).
I think ZIP might pick up some structure (see e.g. here), just as in my example above it would pick up some sort of symmetry. But your your decoder/encoder in your compression scheme should include/have access to more information regarding the model you are compressing. You might want to check out this post for an attempt at compressing model performance using interperetations.
This sounds right to me, but importantly it also matters what you are trying to understand (and thus compress). For AI safety, the thing we should be interested in is not the weights directly, but the behavior of the neural network. The behavior (the input-output mapping) is realized through a series of activations. Activations are realized through applying weights to inputs in particular ways. Weights are realized by setting up an optimization problem with a network architecture and training data. One could try compressing at any one of those levels, and of course they are all related, and in some sense if you know the earlier layer of abstraction you know the later one. But in another sense, they are fundamentally different, in exactly how quickly you can retrieve the specific piece of information, in this case the one we are interested in—which is the behavior. If I give you the training data, the network architecture, and the optimization algorithm, it still takes a lot of work to retrieve the behavior.
Thus, the story you gave about how accessibility matters also explains layers of abstraction, and how they relate to understanding.
Another example of this is a dynamical system. The differential equation governing it is quite compact: $\dot{x}=f(x)$. But the set of possible trajectories can be quite complicated to describe, and to get them one has to essentially do all the annoying work of integrating the equation! Note that this has implications for compositionality of the systems: While one can compose two differential equations by e.g. adding in some cross term, the behaviors (read: trajectores) of the composite system do not compose! and so one is forced to integrate a new system from scratch!
Now, if we want to understand the behavior of the dynamical system, what should we be trying to compress? How would our understanding look different if we compress the governing equations vs. the trajectories?
Interesting! I think the problem is dense/compressed information can be represented in ways in which it is not easily retrievable for a certain decoder. The standard model written in Chinese is a very compressed representation of human knowledge of the universe and completely inscrutable to me. Or take some maximally compressed code and pass it through a permutation. The information content is obviously the same but it is illegible until you reverse the permutation.
In some ways it is uniquely easy to do this to codes with maximal entropy because per definition it will be impossible to detect a pattern and recover a readable explanation.
In some ways the compressibility of NNs is a proof that a simple model exists, without revealing a understandable explanation.
I think we can have (almost) minimal yet readable model without exponentially decreasing information density as required by LDCs.
Hm, feels off to me. What privileges the original representation of the uncompressed file as the space in which locality matters? I can buy the idea that understanding is somehow related to a description that can separate the whole into parts, but why do the boundaries of those parts have to live in the representation of the file I’m handed? Why can’t my explanation have parts in some abstract space instead? Lots of explanations of phenomena seem to work like that.
Maybe it’s more correct to say that understanding requires specifically compositional compression, which maintains an interface-based structure hence allowing us to reason about parts without decompressing the whole, as well as maintaining roughly constant complexity as systems scale, which parallels local decodability. ZIP achieves high compression but loses compositionality.
Wouldn’t the insight into understanding be in the encoding, particularly how the encoder discriminates between what is necessary to ‘understand’ a particular function of a system and what is not salient? (And if I may speculate wildly, in organisms may be correlative to dopamine in the Nucleus Accumbens. Maybe.)
All mental models of the world are inherently lossy, this is the map-territory analogy in a nutshell (itself—a lossy model). The effectiveness or usefulness of a representation determines the level of ‘understanding’ this is entirely dependent on the apparent salience at the time of encoding which determines what elements are given higher fidelity in encoding, and which are more lossy. Perhaps this example will stretch the use of ‘understanding’ but consider a fairly crowded room at a conference where there is a lot of different conversations and dialogue—I see a friend gesticulating at me on the far side of the room. Once they realize I’ve made eye contact they start pointing surreptitiously to their left—so I look immediately to their left (my right) and see five different people and a strange painting on the wall—all possible candidates for what they are pointing at, perhaps it’s the entire circle of people.
Now I’m not sure at this point that the entire ‘message’ - message here being all the possible candidates for what my friend is pointing at—has been ‘encoded’ such that LDC could be used to single out (decode) the true subject. Or is it? In this example, I would have failed to reach ‘understanding’ of their pointing gesture (although I did understand their previous attempt to get my attention).
Now, suppose, my friend was pointing not to the five people or to the painting at all—but something or sixth someone further on: a distinguished colleague is drunk let’s say—but I hadn’t noticed. If I had of seen that colleague, I would have understood my friend’s pointing gesture. This goes beyond LDC because you can’t retrieve a local code of something which extends beyond the full, uncompressed message.
Does this make sense to anyone? Please guide me if I’m very mistaken.
I think Locally Decodable Code is perhaps less analogous to understanding, but probably a mental tool for thinking about how we recall and operate with something we do understand. But the ‘understanding’. For example, looking back on the conference and my friend says “hey remember when I was pointing at you”—that means I don’t need to decode the entire memory of the conference—every speech, every interaction I had but only that isolated moment. Efficient!
You May Want to Know About Locally Decodable Codes
In AI alignment and interpretability research, there’s a compelling intuition that understanding equals compression. The idea is straightforward: if you truly understand a system, you can describe it more concisely by leveraging that understanding. This philosophy suggests that better interpretability techniques for neural networks should yield better compression of their behavior or parameters.
jake_mendel asks: if understanding equals compression, then shouldn’t ZIP compression of neural network weights count as understanding? After all, ZIP achieves remarkable compression ratios on neural network weights—likely better than any current interpretability technique. Yet intuitively, having a ZIP file of weights doesn’t feel like understanding at all! We wouldn’t say we’ve interpreted a neural network just because we’ve compressed its weights into a ZIP file.
Compressing a bit string means finding a code for that string, and the study of such codes is the central topic of both algorithmic and Shannon information theory. Just compressing the set of weights as small as possible is too naive—we probably want to impose additional properties on the codes.
One crucial property we might want is “local decodability”: if you ask a question about any specific part of the original neural network, you should be able to answer it by examining only a small portion of the compressed representation. You shouldn’t need to decompress the entire thing just to understand one small aspect of how the network operates. This matches our intuitions about human understanding—when we truly understand something, we can answer specific questions about it without having to review everything we know.
A Locally Decodable Code (LDC) is a special type of error-correcting code that allows recovery of any single bit of the original message by querying only a small number of bits of the encoded message, even in the presence of some corruption. This property stands in stark contrast to ZIP compression, which requires processing the entire compressed file sequentially to recover any specific piece of information. ZIP compression is not locally decodable.
There’s a fundamental tension between how compact an LDC can be (its rate) and how many bits you need to query to decode a single piece of information (query complexity). You can’t make an LDC that only needs to look at one position, and if you restrict yourself to two queries, your code length must grow exponentially with message size.
This technical tradeoff might reflect something deeper about the nature of understanding. Perhaps true understanding requires both compression (representing information concisely) and accessibility (being able to quickly retrieve specific pieces of information), and there are fundamental limits to achieving both simultaneously.
@Lucius Bushnaq @Matthias Dellago
I don’t remember the details, but IIRC ZIP is mostly based on Lempel-Ziv, and it’s fairly straightforward to modify Lempel-Ziv to allow for efficient local decoding.
My guess would be that the large majority of the compression achieved by ZIP on NN weights is because the NN weights are mostly-roughly-standard-normal, and IEEE floats are not very efficient for standard normal variables. So ZIP achieves high compression for “kinda boring reasons”, in the sense that we already knew all about that compressibillity but just don’t leverage it in day-to-day operations because our float arithmetic hardware uses IEEE.
John, you know much coding theory much better than I do so I am inclinced to defer to your superior knowledge...
Now behold the awesome power of gpt-Pro
Using ZIP as compression metric for NNs (I assume you do something along the lines of “take all the weights and line them up and then ZIP”) is unintuitive to me for the following reason:
ZIP, though really this should apply to any other coding scheme that just tries to compress the weights by themselves, picks up on statistical patterns in the raw weights. But NNs are not just simply a list of floats, they are arranged in highly structured manner. The weights themselves get turned into functions and it is 1.the functions, and 2. the way the functions interact that we are ultimately trying to understand (and therefore compress).
To wit, a simple example for the first point : Assume that inside your model is a 2x2 matrix with entries M=[0.587785, −0.809017, 0.809017, 0.587785]. Storing it like this will cost you a few bytes and if you compress it you can ~ half the cost I believe. But really there is a much more compact way to store this information: This matrix represents a rotation by 36 degrees. Storing it this way, requires less than 1 byte.
This phenomenon should get worse for bigger models. One reason is the following: If we believe that the NN uses superposition, then there will be an overbasis in which all the computations are done (more) sparsly. And if we don’t factor that in, then ZIP will not include such information (Caveat: This is my intuition, I don’t have empirical results to back this up).
I think ZIP might pick up some structure (see e.g. here), just as in my example above it would pick up some sort of symmetry. But your your decoder/encoder in your compression scheme should include/have access to more information regarding the model you are compressing. You might want to check out this post for an attempt at compressing model performance using interperetations.
This sounds right to me, but importantly it also matters what you are trying to understand (and thus compress). For AI safety, the thing we should be interested in is not the weights directly, but the behavior of the neural network. The behavior (the input-output mapping) is realized through a series of activations. Activations are realized through applying weights to inputs in particular ways. Weights are realized by setting up an optimization problem with a network architecture and training data. One could try compressing at any one of those levels, and of course they are all related, and in some sense if you know the earlier layer of abstraction you know the later one. But in another sense, they are fundamentally different, in exactly how quickly you can retrieve the specific piece of information, in this case the one we are interested in—which is the behavior. If I give you the training data, the network architecture, and the optimization algorithm, it still takes a lot of work to retrieve the behavior.
Thus, the story you gave about how accessibility matters also explains layers of abstraction, and how they relate to understanding.
Another example of this is a dynamical system. The differential equation governing it is quite compact: $\dot{x}=f(x)$. But the set of possible trajectories can be quite complicated to describe, and to get them one has to essentially do all the annoying work of integrating the equation! Note that this has implications for compositionality of the systems: While one can compose two differential equations by e.g. adding in some cross term, the behaviors (read: trajectores) of the composite system do not compose! and so one is forced to integrate a new system from scratch!
Now, if we want to understand the behavior of the dynamical system, what should we be trying to compress? How would our understanding look different if we compress the governing equations vs. the trajectories?
Indeed, even three query locally decodable codes have code lengths that must grow exponentially with message size:
https://www.quantamagazine.org/magical-error-correction-scheme-proved-inherently-inefficient-20240109/
Interesting! I think the problem is dense/compressed information can be represented in ways in which it is not easily retrievable for a certain decoder. The standard model written in Chinese is a very compressed representation of human knowledge of the universe and completely inscrutable to me.
Or take some maximally compressed code and pass it through a permutation. The information content is obviously the same but it is illegible until you reverse the permutation.
In some ways it is uniquely easy to do this to codes with maximal entropy because per definition it will be impossible to detect a pattern and recover a readable explanation.
In some ways the compressibility of NNs is a proof that a simple model exists, without revealing a understandable explanation.
I think we can have (almost) minimal yet readable model without exponentially decreasing information density as required by LDCs.
Hm, feels off to me. What privileges the original representation of the uncompressed file as the space in which locality matters? I can buy the idea that understanding is somehow related to a description that can separate the whole into parts, but why do the boundaries of those parts have to live in the representation of the file I’m handed? Why can’t my explanation have parts in some abstract space instead? Lots of explanations of phenomena seem to work like that.
Maybe it’s more correct to say that understanding requires specifically compositional compression, which maintains an interface-based structure hence allowing us to reason about parts without decompressing the whole, as well as maintaining roughly constant complexity as systems scale, which parallels local decodability. ZIP achieves high compression but loses compositionality.
Wouldn’t the insight into understanding be in the encoding, particularly how the encoder discriminates between what is necessary to ‘understand’ a particular function of a system and what is not salient? (And if I may speculate wildly, in organisms may be correlative to dopamine in the Nucleus Accumbens. Maybe.)
All mental models of the world are inherently lossy, this is the map-territory analogy in a nutshell (itself—a lossy model). The effectiveness or usefulness of a representation determines the level of ‘understanding’ this is entirely dependent on the apparent salience at the time of encoding which determines what elements are given higher fidelity in encoding, and which are more lossy. Perhaps this example will stretch the use of ‘understanding’ but consider a fairly crowded room at a conference where there is a lot of different conversations and dialogue—I see a friend gesticulating at me on the far side of the room. Once they realize I’ve made eye contact they start pointing surreptitiously to their left—so I look immediately to their left (my right) and see five different people and a strange painting on the wall—all possible candidates for what they are pointing at, perhaps it’s the entire circle of people.
Now I’m not sure at this point that the entire ‘message’ - message here being all the possible candidates for what my friend is pointing at—has been ‘encoded’ such that LDC could be used to single out (decode) the true subject. Or is it?
In this example, I would have failed to reach ‘understanding’ of their pointing gesture (although I did understand their previous attempt to get my attention).
Now, suppose, my friend was pointing not to the five people or to the painting at all—but something or sixth someone further on: a distinguished colleague is drunk let’s say—but I hadn’t noticed. If I had of seen that colleague, I would have understood my friend’s pointing gesture. This goes beyond LDC because you can’t retrieve a local code of something which extends beyond the full, uncompressed message.
Does this make sense to anyone? Please guide me if I’m very mistaken.
I think Locally Decodable Code is perhaps less analogous to understanding, but probably a mental tool for thinking about how we recall and operate with something we do understand. But the ‘understanding’. For example, looking back on the conference and my friend says “hey remember when I was pointing at you”—that means I don’t need to decode the entire memory of the conference—every speech, every interaction I had but only that isolated moment. Efficient!