The universal distribution/prior is lower semi computable, meaning there is one Turing machine that can approximate it from below, converging to it in the limit. Also, there is a probabilistic Turing machine that induces the universal distribution. So there is a rather clear sense in which one can “use the universal distribution.”
Thanks for bringing this up.
However, I’m skeptical that lower semi-computability really gets us much. While there is a TM that converges to the UP, we have no (computable) way of knowing how close the approximation is at any given time. That is, the UP is not “estimable” as defined in Hutter 2005.
So if someone hands you a TM and tells you it lower semi-computes the UP, this TM is not going to be of much use to you: its output at any finite time could be arbitrarily bad for whatever calculation you’re trying to do, and you’ll have no way of knowing.
In other words, while you may know the limiting behavior, this information tells you nothing about what you should expect to observe, because you can never know whether you’re “in the limit yet,” so to speak. (If this language seems confusing, feel free to ignore the previous sentence—for some reason it clarifies things for me.)
(It’s possible that this non-”estimability” property is less bad than I think for some reason, IDK.)
I’m less familiar with the probabilistic Turing machine result, but my hunch is that one would face a similar obstruction with that TM as well.
Yes, I mostly agree with everything you said—the limitation with the probabilistic Turing machine approach (it’s usually equivalently described as the a priori probability and described in terms of monotone TM’s) is that you can get samples, but you can’t use those to estimate conditionals. This is connected to the typical problem of computing the normalization factor in Bayesian statistics. It’s possible that these approximations would be good enough in practice though.
Thanks for bringing this up.
However, I’m skeptical that lower semi-computability really gets us much. While there is a TM that converges to the UP, we have no (computable) way of knowing how close the approximation is at any given time. That is, the UP is not “estimable” as defined in Hutter 2005.
So if someone hands you a TM and tells you it lower semi-computes the UP, this TM is not going to be of much use to you: its output at any finite time could be arbitrarily bad for whatever calculation you’re trying to do, and you’ll have no way of knowing.
In other words, while you may know the limiting behavior, this information tells you nothing about what you should expect to observe, because you can never know whether you’re “in the limit yet,” so to speak. (If this language seems confusing, feel free to ignore the previous sentence—for some reason it clarifies things for me.)
(It’s possible that this non-”estimability” property is less bad than I think for some reason, IDK.)
I’m less familiar with the probabilistic Turing machine result, but my hunch is that one would face a similar obstruction with that TM as well.
Yes, I mostly agree with everything you said—the limitation with the probabilistic Turing machine approach (it’s usually equivalently described as the a priori probability and described in terms of monotone TM’s) is that you can get samples, but you can’t use those to estimate conditionals. This is connected to the typical problem of computing the normalization factor in Bayesian statistics. It’s possible that these approximations would be good enough in practice though.