Now I’m confused. When we say “one bit of information”, we usually mean one bit about one particular item. If I say, “The cat in this box, which formerly could have been alive or dead, is dead,” that’s one bit of information. But if I say, “All of the cats in the world are now dead”, that’s surely more information, and must be more than one bit.
My first reaction was to say that it takes more information to specify “all the cats in the world” than to specify “my roommate’s cat, which she foolishly lent me for this experiment”. But it doesn’t.
(It certainly takes more work to enforce one bit of information when its domain is the entire Earth, than when it applies only to the desk in front of you. Applying the same 37 bits of information to the attributes of every person in the entire world would be quite a feat.)
At the risk of stating the obvious: The information content of a datum is its surprisal, the logarithm of the prior probability that it is true. If I currently give 1% chance that the cat in the box is dead, discovering that it is dead gives me 6.64 bits of information.
Now I’m confused. When we say “one bit of information”, we usually mean one bit about one particular item. If I say, “The cat in this box, which formerly could have been alive or dead, is dead,” that’s one bit of information. But if I say, “All of the cats in the world are now dead”, that’s surely more information, and must be more than one bit.
My first reaction was to say that it takes more information to specify “all the cats in the world” than to specify “my roommate’s cat, which she foolishly lent me for this experiment”. But it doesn’t.
(It certainly takes more work to enforce one bit of information when its domain is the entire Earth, than when it applies only to the desk in front of you. Applying the same 37 bits of information to the attributes of every person in the entire world would be quite a feat.)
At the risk of stating the obvious: The information content of a datum is its surprisal, the logarithm of the prior probability that it is true. If I currently give 1% chance that the cat in the box is dead, discovering that it is dead gives me 6.64 bits of information.