The entropy of a message () is roughly proportional to the uniformity of its probability distribution ( for each possible ). If the message has just two possible values, is greater exactly insofar as the split between their probabilities is close to 50⁄50.
(bits) | ||
0.5 | 0.5 | 1 |
0.7 | 0.3 | 0.88 |
0.8 | 0.2 | 0.72 |
0.9 | 0.1 | 0.47 |
0.95 | 0.05 | 0.29 |
The entropy of a message is, intuitively, proportional to its information content. Thus you can learn more efficiently by seeking messages generated in higher-entropy ways.
Assuming that people ask questions to get information, and that questions are strictly yes-or-no, or otherwise have two main answers (as in “which direction — east or west — leads to our destination?”), the best questions are those which separate options of roughly equal prior probability to the questioner.
The prior probability does not necessarily match the intuitive (but kinda meaningless) “objective probability”. E.g. with no specific information for the scenario, the “which direction” question is a 50⁄50 split, but if you’re near the east coast of an island containing an otherwise-unknown destination, your priors should be biased in favour of “west”.
There are at least two uses of this maxim. You can use it yourself to guide your choice of questions. You can assume others follow it and, when they seem to violate it by asking weird binary questions, find that at least one of the maxim’s assumptions are false:
the questioner doesn’t seek information (as in rhetorical questions)
much of the questioner’s response-probability goes to answers other than the main binary (as in the seemingly-binary “is the destination in place A, or place B?”, when they really expect that they might get the longer answer “actually, C”)
the questioner doesn’t know enough information theory
the questioner’s priors are very different from what you expect (as in “are you homo or hetero?”, when they have information that favours the less common option)
Alas, I don’t (yet) know the relative frequency of those listed confusion modes.
Option 5: the questioner is optimizing a metric other than what appears to be the post’s implicit “get max info with minimal number of questions, ignoring communication overhead”, which is IMHO a weird metric to optimize to begin with—not only it does not take length/complexity of each question into account, but is also ignoring things like maintaining answerer wilingness to continue answering questions, not annoying the answerer, ensuring proper context so that a question is not misunderstood, and this is not even taking into account the possiblity that while the questioner does care about getting the information, they might also simultaneously care about other things.