Hypothesis Space Entropy
In Why quantitative finance is so hard I explained why the entropy of your dataset must exceed the entropy of your hypothesis space. I used a simple hypothesis space with equally likely hypotheses each with tunable parameters. Real life is not usually so homogeneous.
No Tunable Parameters
Consider an inhomogeneous hypothesis space with zero tunable parameters. Instead of which works for homogeneous hypothesis spaces, we must use more complicated entropy equation.
This equation makes intuitive sense. It vanishes when one equals 1 and all other equal 0. Our equation is extremized when all are equal at . is the maximal case when [1].
With Tunable Parameters
Suppose each hypothesis has tunable parameters. We can plug into our entropy equation.
Our old equation is just the special case where all are homogenous and are homogeneous too.
We have so far treated as representative of each hypothesis’s tunable parameters. More generally, represents each hypothesis’s internal entropy. If we think of hypotheses as a weighted tree, is what you get when you iterate one level down the tree. Our variable identifies the root of the tree. Suppose th branch of the next level down is called .
We can define the entropy of the rest of the tree with a recursive equation.
There are two parts to this equation: the recursive component and the branching component .
Branching component
The component is maximized when .
The branching component tops out at . It can never contribute a massive quantity of entropy to your hypothesis space because it is limited to entropy per level of the tree.
The branching factor is mostly unimportant. The bulk of our entropy comes from the recursive component.
Recursive component
Fix at a positive value. There is no limit to how big can become. You can make it arbitrarily large just by adding parameters. Consequently can become arbitrarily large too. In real world situations we should expect the recursive components of our hypothesis space to dominate the branching components.
If vanishes then the recursive component disappears. This might explain why human minds like to round “extremely unlikely” to “impossible” when is large. It removes lots of entropy from our hypothesis space still being right almost all of the time. This may be related to synaptic pruning.
Lessons for Hypothesis Space Design
Once again, we have confirmed that having hypotheses with lots of parameters is a worse problem that having lots of hypotheses to choose between. More generally, one or more hypotheses with exceptionally high entropy dominate the total entropy of your hypothesis space. If you want better priors then the first step of your optimization should be to eliminate these complex subtrees from your hypothesis space.
- ↩︎
Proof:
- [Book Review] “The Alignment Problem” by Brian Christian by 20 Sep 2021 6:36 UTC; 70 points) (
- A Word to the Wise is Sufficient because the Wise Know So Many Words by 4 Apr 2022 1:08 UTC; 31 points) (
- Technical Predictions Related to AI Safety by 13 Aug 2021 0:29 UTC; 29 points) (
- Re: Attempted Gears Analysis of AGI Intervention Discussion With Eliezer by 15 Nov 2021 10:02 UTC; 20 points) (
- 14 Oct 2021 20:50 UTC; 3 points) 's comment on [Prediction] We are in an Algorithmic Overhang by (
- 13 Aug 2021 16:58 UTC; 2 points) 's comment on Technical Predictions Related to AI Safety by (
I am confused what is meant by a ‘hypothesis’.
Is this a probability distribution? What is the mathematical object that you denote by hypothesis?
It’s a probability distribution. A hypothesis space is a probability distribution of probability distributions.