the structured space will always give strictly better predictions than the socratic hypothesis.
I don’t think so.… suppose in the H/T example, the Socratic hypothesis says that P(H) = P(T) = 3. Then it will always do better than any hypothesis that has to be normalized.
I’m not sure what you mean by “structured hypotheses” here though...
In the case where you get 1 heads, 5 tails, 25 heads, etc., then at every point in time, and you are working with the assumption that the flips are independent, then the Bayesian hypothesis will never converge, but it will actually give better predictions than the Socratic hypothesis most of the time. In particular when it’s halfway through one of the powers of five, it will assign P>.5 to the correct prediction every time. And if you expand that to assuming the flip can depend on the previous flip, it will get to a hypothesis (the next flip will be the same as the last one) that actually performs VERY well, and does converge.
By “structured” I mean that I have a principled way of determining P(Evidence|Hypothesis); with the Socratic hypothesis I only have unprincipled ways of determining it.
I’m not sure what you mean by normalized, unless you mean that the Socratic hypothesis always gives probability 1 to the observed evidence, in which case it will dominate even the correct hypothesis if there is uncertainty.
I don’t think so.… suppose in the H/T example, the Socratic hypothesis says that P(H) = P(T) = 3. Then it will always do better than any hypothesis that has to be normalized.
I’m not sure what you mean by “structured hypotheses” here though...
In the case where you get 1 heads, 5 tails, 25 heads, etc., then at every point in time, and you are working with the assumption that the flips are independent, then the Bayesian hypothesis will never converge, but it will actually give better predictions than the Socratic hypothesis most of the time. In particular when it’s halfway through one of the powers of five, it will assign P>.5 to the correct prediction every time. And if you expand that to assuming the flip can depend on the previous flip, it will get to a hypothesis (the next flip will be the same as the last one) that actually performs VERY well, and does converge.
By “structured” I mean that I have a principled way of determining P(Evidence|Hypothesis); with the Socratic hypothesis I only have unprincipled ways of determining it.
I’m not sure what you mean by normalized, unless you mean that the Socratic hypothesis always gives probability 1 to the observed evidence, in which case it will dominate even the correct hypothesis if there is uncertainty.