One notable absence is the Solomonoff prior, where you weight predictions (of prefix-free TMs) by 2−K to get a probability distribution. Related would be approximations like MML prediction.
Another nitpick would be that Shannon entropy is defined for distributions, not just raw strings of data, so you also have to fix the inference process you’re using to extract probabilities from data.
These are both great points and are definitely going to be important parts of where the story is going! Probably we could have done a better job with explication, especially with that last point, thanks. Maybe one way to think about it is, what are the most useful ways we can convert data to distributions, and what do they tell us about the data generation process, which is what the next post will be about.
interested to see what’s next.
One notable absence is the Solomonoff prior, where you weight predictions (of prefix-free TMs) by 2−K to get a probability distribution. Related would be approximations like MML prediction.
Another nitpick would be that Shannon entropy is defined for distributions, not just raw strings of data, so you also have to fix the inference process you’re using to extract probabilities from data.
These are both great points and are definitely going to be important parts of where the story is going! Probably we could have done a better job with explication, especially with that last point, thanks. Maybe one way to think about it is, what are the most useful ways we can convert data to distributions, and what do they tell us about the data generation process, which is what the next post will be about.