I’m pretty sure information theory books mention this phenomenon (though not explicitly as a theorem) in discussions of stream arithmetic coding. Those can be interpreted as a fraction written in binary, where each bit narrows down a region of the [0,1) numberline, with earlier-terminating bitstreams reserved for more likely source strings.
Any probability distribution on integers has a corresponding (optimal) arithmetic stream coder and you can always make it indefinitely expect more input by feeding it the bit that corresponds to the least likely next character (which means it’s not finished decompressing).
MacKay discusses this in Chapter 6 of his free book.
I’m pretty sure information theory books mention this phenomenon (though not explicitly as a theorem) in discussions of stream arithmetic coding. Those can be interpreted as a fraction written in binary, where each bit narrows down a region of the [0,1) numberline, with earlier-terminating bitstreams reserved for more likely source strings.
Any probability distribution on integers has a corresponding (optimal) arithmetic stream coder and you can always make it indefinitely expect more input by feeding it the bit that corresponds to the least likely next character (which means it’s not finished decompressing).
MacKay discusses this in Chapter 6 of his free book.