And it brings to mind decision trees, which are essentially an automated way of playing Twenty Questions with the universe. In order to avoid over-fitting your training data, once you’ve constructed a complete decision tree, you go back and prune it, removing questions that are below a certain threshold of usefulness.
The usual way you do this is, you look at the expected reduction in entropy from asking a particular question. If it doesn’t reduce the entropy much, don’t bother asking. If you know that an animal is a bird, you don’t gain much by asking “Is it an Emperor penguin?”. You would reduce the entropy in your pool of possible birds more by asking if it’s a songbird, or if its average adult wingspan is more than 10 cm.
SarahC’s quote is not only clever, but also supported by solid math and practical application.
Very quotable
And it brings to mind decision trees, which are essentially an automated way of playing Twenty Questions with the universe. In order to avoid over-fitting your training data, once you’ve constructed a complete decision tree, you go back and prune it, removing questions that are below a certain threshold of usefulness.
The usual way you do this is, you look at the expected reduction in entropy from asking a particular question. If it doesn’t reduce the entropy much, don’t bother asking. If you know that an animal is a bird, you don’t gain much by asking “Is it an Emperor penguin?”. You would reduce the entropy in your pool of possible birds more by asking if it’s a songbird, or if its average adult wingspan is more than 10 cm.
SarahC’s quote is not only clever, but also supported by solid math and practical application.