I highly recommend Thomas and Cover’s book, a very readable intro on info theory. The point is we don’t need to know the distribution from which the bits came from to do very well in the limit. (There are gains to be had in the region before “in the limit,” but these gains will track the kinds of gains you get in statistics if you want to move beyond asymptotic theory).
You need to read about universal coding, e.g. start here:
http://en.wikipedia.org/wiki/Universal_code_(data_compression%29
I highly recommend Thomas and Cover’s book, a very readable intro on info theory. The point is we don’t need to know the distribution from which the bits came from to do very well in the limit. (There are gains to be had in the region before “in the limit,” but these gains will track the kinds of gains you get in statistics if you want to move beyond asymptotic theory).