Unless you have a model that exactly describes how a given message was generated, its Shannon entropy is not known but estimated… and typically estimated based on the current state of the art in compression algorithms. So unless I misunderstood, this seems like a circular argument.
I highly recommend Thomas and Cover’s book, a very readable intro on info theory. The point is we don’t need to know the distribution from which the bits came from to do very well in the limit. (There are gains to be had in the region before “in the limit,” but these gains will track the kinds of gains you get in statistics if you want to move beyond asymptotic theory).
Unless you have a model that exactly describes how a given message was generated, its Shannon entropy is not known but estimated… and typically estimated based on the current state of the art in compression algorithms. So unless I misunderstood, this seems like a circular argument.
You need to read about universal coding, e.g. start here:
http://en.wikipedia.org/wiki/Universal_code_(data_compression%29
I highly recommend Thomas and Cover’s book, a very readable intro on info theory. The point is we don’t need to know the distribution from which the bits came from to do very well in the limit. (There are gains to be had in the region before “in the limit,” but these gains will track the kinds of gains you get in statistics if you want to move beyond asymptotic theory).