Imagine you have two unknown bits, generated uniformly at random. What’s the fastest way to learn them by asking yes or no questions?
1) Ask whether the first bit is 0, then do the same for the second bit. This way you always spend two questions.
2) First ask whether the bits are 00. If yes, you win. Otherwise ask the remaining questions in any way you like. This works out to 9⁄4 questions on average, which is worse than the first method.
Moral of the story: if you want to learn information as fast as possible, you must split the search space in equal parts. That’s the same as saying that a uniform distribution has maximum self-information, i.e. maximum entropy.
Imagine you have two unknown bits, generated uniformly at random. What’s the fastest way to learn them by asking yes or no questions?
1) Ask whether the first bit is 0, then do the same for the second bit. This way you always spend two questions.
2) First ask whether the bits are 00. If yes, you win. Otherwise ask the remaining questions in any way you like. This works out to 9⁄4 questions on average, which is worse than the first method.
Moral of the story: if you want to learn information as fast as possible, you must split the search space in equal parts. That’s the same as saying that a uniform distribution has maximum self-information, i.e. maximum entropy.