but I’m not sure to what extent such a prior can rationally be determined, i.e. the pattern of likely theories, of which simplicity is a factor.
A theory that takes one more bit is less than twice as likely. Either that, or all finite theories have infinitesimal likelihoods. I can’t tell you how much less than twice, and I can’t tell you what compression algorithm you’re using. Trying to program the compression algorithm only means that the language you just used is the algorithm.
Technically, the extra bit thing is only as the amount of data goes to infinity, but that’s equivalent to the compression algorithm part.
I also assign anything with infinities as having zero probability, because otherwise paradoxes of infinity would break probability and ethics.
Is it correct, to say that the bit based prior is a consequence of creating an internally consistent formalisation of the aesthetic heuristic of preferring simpler structures to complex ones?
If so I was wondering if it could be extended to reflect other aesthetics. For example, if an experiment produces a single result that is inconsistent with an existing simple physics theory, it may be that the simplest theory that explains this data is to treat this result as an isolated exception, however, aesthetically we find it more plausible that this exception is evidence of a larger theory that the sample is one part of.
In contrast when attempting to understand the rules of a human system (e.g. a bureaucracy) constructing a theory that lacked exceptions seems unlikely (“that’s a little too neat”). Indeed when stated informally the phrase might go “in my experience, that’s a little too neat” implying that we formulate priors based on learned patterns from experience. In the case of the bureaucracy, this may stem from a probabilistic understanding of the types of system that result from a particular ‘maker’ (i.e. politics).
However, this moves the problem to one of classifying contexts and determining which contexts are relevant, if this process is considered part of the theory, then it may considerably increase its complexity always preferring theories which ignore context. Unless of course the theory is complete (incorporating all contexts) in which case the simplest theory may share these contextual models and thus become the universal simplest model. It would therefore not be rational to apply Kolmogorov complexity to a problem in isolation. I.e. probability and reductionism are not compatible.
A theory that takes one more bit is less than twice as likely. Either that, or all finite theories have infinitesimal likelihoods. I can’t tell you how much less than twice, and I can’t tell you what compression algorithm you’re using. Trying to program the compression algorithm only means that the language you just used is the algorithm.
Technically, the extra bit thing is only as the amount of data goes to infinity, but that’s equivalent to the compression algorithm part.
I also assign anything with infinities as having zero probability, because otherwise paradoxes of infinity would break probability and ethics.
That’s the extent to which it can be done.
Interesting,
Is it correct, to say that the bit based prior is a consequence of creating an internally consistent formalisation of the aesthetic heuristic of preferring simpler structures to complex ones?
If so I was wondering if it could be extended to reflect other aesthetics. For example, if an experiment produces a single result that is inconsistent with an existing simple physics theory, it may be that the simplest theory that explains this data is to treat this result as an isolated exception, however, aesthetically we find it more plausible that this exception is evidence of a larger theory that the sample is one part of.
In contrast when attempting to understand the rules of a human system (e.g. a bureaucracy) constructing a theory that lacked exceptions seems unlikely (“that’s a little too neat”). Indeed when stated informally the phrase might go “in my experience, that’s a little too neat” implying that we formulate priors based on learned patterns from experience. In the case of the bureaucracy, this may stem from a probabilistic understanding of the types of system that result from a particular ‘maker’ (i.e. politics).
However, this moves the problem to one of classifying contexts and determining which contexts are relevant, if this process is considered part of the theory, then it may considerably increase its complexity always preferring theories which ignore context. Unless of course the theory is complete (incorporating all contexts) in which case the simplest theory may share these contextual models and thus become the universal simplest model. It would therefore not be rational to apply Kolmogorov complexity to a problem in isolation. I.e. probability and reductionism are not compatible.