There is a subtlety here. Large updates from extremely unlikely to quite likely are common. Large updates from quite likely to exponentially sure are harder to come by. Lets pick an extreme example, suppose a friend builds a coin tossing robot. The friend sends you a 1mb file, claiming it is the sequence of coin tosses. Your probability assigned to this particular sequence being the way the coin landed will jump straight from 2−8,000,000 to somewhere between 1% and 99% (depending on the friends level of trustworthiness and engineering skill) Note that the probability you assign to several other sequences increases too. For example, its not that unlikely that your friend accidentally put a not in their code, so your probability on the exact opposite sequence should also be >>2−8,000,000 Its not that unlikely that they turned the sequence backwards, or xored it with pi or … Do you see the pattern. You are assigning high probability to the sequences with low conditional Komolgorov complexity relative to the existing data.
Now think about what it would take to get a probability of 1−2−8,000,000 on the coin landing that particular sequence. All sorts of wild and wacky hypothesis have probability >2−8,000,000 . From the boring stuff like a dodgy component or other undetected bug, to more exotic hypothesis like aliens tampering with the coin tossing robot, or dark lords of the matrix directly controlling your optic nerve. You can’t get this level of certainty about anything ever. (modulo concerns about what it means to assign p<1 to probability theory)
You can easily update from exponentially close to 0, but you can’t update to exponentially close to one. This may have something to do with there being exponentially many very unlikely theories to start off with. But only a few likely ones.
If you have 3 theories that predict much the same observations, and all other theories predict something different, you can easily update to “probably one of these 3”. But you can’t tell those 3 apart. In AIXI, any turing machine has a parade of slightly more complex, slightly less likely turing machines trailing along behind it. The hypothesis “all the primes, and grahams number” is only slightly more complex than “all the primes”, and is very hard to rule out.
There is a subtlety here. Large updates from extremely unlikely to quite likely are common. Large updates from quite likely to exponentially sure are harder to come by. Lets pick an extreme example, suppose a friend builds a coin tossing robot. The friend sends you a 1mb file, claiming it is the sequence of coin tosses. Your probability assigned to this particular sequence being the way the coin landed will jump straight from 2−8,000,000 to somewhere between 1% and 99% (depending on the friends level of trustworthiness and engineering skill) Note that the probability you assign to several other sequences increases too. For example, its not that unlikely that your friend accidentally put a not in their code, so your probability on the exact opposite sequence should also be >>2−8,000,000 Its not that unlikely that they turned the sequence backwards, or xored it with pi or … Do you see the pattern. You are assigning high probability to the sequences with low conditional Komolgorov complexity relative to the existing data.
Now think about what it would take to get a probability of 1−2−8,000,000 on the coin landing that particular sequence. All sorts of wild and wacky hypothesis have probability >2−8,000,000 . From the boring stuff like a dodgy component or other undetected bug, to more exotic hypothesis like aliens tampering with the coin tossing robot, or dark lords of the matrix directly controlling your optic nerve. You can’t get this level of certainty about anything ever. (modulo concerns about what it means to assign p<1 to probability theory)
You can easily update from exponentially close to 0, but you can’t update to exponentially close to one. This may have something to do with there being exponentially many very unlikely theories to start off with. But only a few likely ones.
If you have 3 theories that predict much the same observations, and all other theories predict something different, you can easily update to “probably one of these 3”. But you can’t tell those 3 apart. In AIXI, any turing machine has a parade of slightly more complex, slightly less likely turing machines trailing along behind it. The hypothesis “all the primes, and grahams number” is only slightly more complex than “all the primes”, and is very hard to rule out.