Very interesting article! It seemed very surprising to me that the information entropy can be interpreted as the (minimum average) number of yes/no-questions needed to fully determine the state of a system. Especially since the value of the entropy depends on the units (i.e. what logarithm base you use). Is the use of base two related to the fact that the answers can be either ‘yes’ or ‘no’ (binary)?
Another question. Suppose you have a system X in 4 possible states {X1,X2,X3,X4} with probabilities {3/8,2/8,2/8,1/8} respectively. The information entropy is about 1.91 bits. But I couldnt find any series of yes/no questions to ask that would give me 1.91 questions on average. An average of 2 questions is the best I could do. That is the same as the situation with probabilities {1/4,1/4,1/4,1/4} even though we have more information than that in our present case. Am I not looking hard enough?
I`m really trying to get a grasp on the concept of information entropy so I hope someone can answer my questions.
EDIT: Welcome to LessWrong. You may want introduce yourself in the welcome thread
Especially since the value of the entropy depends on the units (i.e. what logarithm base you use). Is the use of base two related to the fact that the answers can be either ‘yes’ or ‘no’ (binary)?
Precisely.
{3/8,2/8,2/8,1/8} … 1.91 bits … 2 questions.
Am I not looking hard enough?
No, there really aren’t. A simpler example is just taking two options, with unequal probabilities (take {1/8, 7⁄8} for concreteness). Again, you have to ask one question even though there is less than one bit (0.54 bits if I did the math right). However, if you have many copies of this system, you can ask questions about the entire subset that can (on average over all possible states of the entire system) describe the entire system with less than one question per system, and in the limit of an infinite number of subsystems, this approaches the entropy.
E.g, for two systems, labeling the first state as 0, and the second as 1, you have the following tree:
Are they both 1? a. Yes: stop, state is (1,1) and 1 question asked with probability 49⁄64 = 49⁄64. b. No: continue
Is the first one 1? a: Yes: stop (1, 0), 2 questions asked with probability 7⁄64 b: no: continue
Is the second one in 1? a: Yes: stop (0,1), 3 questions asked with probability 7⁄64 b: No: stop (0,0), 3 questions asked with probability 1⁄64
Total average questions = 49 + 2*7 + 3*8 = 87⁄64 < 2, but greater than 2*0.54.
Very interesting article! It seemed very surprising to me that the information entropy can be interpreted as the (minimum average) number of yes/no-questions needed to fully determine the state of a system. Especially since the value of the entropy depends on the units (i.e. what logarithm base you use). Is the use of base two related to the fact that the answers can be either ‘yes’ or ‘no’ (binary)?
Another question. Suppose you have a system X in 4 possible states {X1,X2,X3,X4} with probabilities {3/8,2/8,2/8,1/8} respectively. The information entropy is about 1.91 bits. But I couldnt find any series of yes/no questions to ask that would give me 1.91 questions on average. An average of 2 questions is the best I could do. That is the same as the situation with probabilities {1/4,1/4,1/4,1/4} even though we have more information than that in our present case. Am I not looking hard enough?
I`m really trying to get a grasp on the concept of information entropy so I hope someone can answer my questions.
Great blog in any case. I`m glad I found it!
EDIT: Welcome to LessWrong. You may want introduce yourself in the welcome thread
Precisely.
No, there really aren’t. A simpler example is just taking two options, with unequal probabilities (take {1/8, 7⁄8} for concreteness). Again, you have to ask one question even though there is less than one bit (0.54 bits if I did the math right). However, if you have many copies of this system, you can ask questions about the entire subset that can (on average over all possible states of the entire system) describe the entire system with less than one question per system, and in the limit of an infinite number of subsystems, this approaches the entropy.
E.g, for two systems, labeling the first state as 0, and the second as 1, you have the following tree:
Are they both 1?
a. Yes: stop, state is (1,1) and 1 question asked with probability 49⁄64 = 49⁄64.
b. No: continue
Is the first one 1?
a: Yes: stop (1, 0), 2 questions asked with probability 7⁄64
b: no: continue
Is the second one in 1?
a: Yes: stop (0,1), 3 questions asked with probability 7⁄64
b: No: stop (0,0), 3 questions asked with probability 1⁄64
Total average questions = 49 + 2*7 + 3*8 = 87⁄64 < 2, but greater than 2*0.54.