You get completely different results by doing that. The universal prior gives higher probability to simpler hypotheses -- 0000000000000000 is more likely than 0010111001001101, or at least analogous things are true once the strings get long enough. The idea is that if there’s any computable regularity in what you’re trying to model, as you get more data you’ll recognize that regularity (i.e., your posterior will give much higher probability to futures in which the regularity continues than to ones in which it doesn’t).
Ah so, I understand. But in that case, doesn’t evaluating your prior require you to integrate over the space of all possible Turing machines? There is no conceptual difficulty with this, but I must say that it strikes me as computationally rather formidable for a poxy little throw-four-possibly-biased-coins problem. :)
My understanding was that it’s perfectly possible to solve the Halting Problem for particular TMs.
For a TM of length n, you begin running it and either it terminates at some point, or it runs for BusyBeaver(n)+1 steps, in which case you know it doesn’t ever halt.
So as long as you have a fixed length n that all possible TMs fit into and know the Busy Beaver for n, you can in fact use the universal prior. (Busy Beavers are uncomputable in general, though we know a fair number, but this is an easy condition to weaken down to something practical like high probability of being the Busy Beaver.)
The fact that BB(n) is uncomputable means that your procedure doesn’t, in fact, work. No?
You don’t, in any case, have a fixed n. (Unless perhaps you are only interested in modelling Turing machines that fit in the universe—but in that case, wouldn’t you also want your own computations to fit in the universe?)
Finding BB(n) for all n is uncomputable. Finding BB(n) where n=10, say, is not uncomputable.
Yeah, this is an interesting point. My thinking is that yes, we do only care about modeling Turing machines that fit in the universe, since by definition those are the only Turing machines whose output we will ever observe. We might also have to bite the bullet and say we only care about predicting Turing machines which are small enough that there is room left in the universe for us; we don’t want to perfectly model the entire universe, just small parts.
If it’s uncomputable in general then it’s uncomputable for some particular n. I expect that in fact it’s uncomputable for any n above some rather small value. (In particular, once n is large enough for there to be n-state Turing machines that unprovably don’t always halt.)
Well, if you only care about predicting smallish Turing machines, then for the same reason prediction is only useful if it can be done with smallish machines, in reasonable time. And even for very small n this isn’t true for calculating BB(n).
I don’t think so, provided you limit yourself to Turing machines constructed of at most a universe’s worth of energy and have unlimited time for the computation. However, that said, invoking such a heavy-duty prior seems to me a bit of a cop-out; you might almost as well invoke “My prior is what Omega told me”. For choosing between Bayesian and frequentist approaches regarding actual problems in the real brain space available to humans, I prefer priors that can be computed in something like a human lifetime. To show that a Bayesian with a prior that would take multiple billion years to compute can do better than a frequentist is not really a very strong result.
You get completely different results by doing that. The universal prior gives higher probability to simpler hypotheses -- 0000000000000000 is more likely than 0010111001001101, or at least analogous things are true once the strings get long enough. The idea is that if there’s any computable regularity in what you’re trying to model, as you get more data you’ll recognize that regularity (i.e., your posterior will give much higher probability to futures in which the regularity continues than to ones in which it doesn’t).
Ah so, I understand. But in that case, doesn’t evaluating your prior require you to integrate over the space of all possible Turing machines? There is no conceptual difficulty with this, but I must say that it strikes me as computationally rather formidable for a poxy little throw-four-possibly-biased-coins problem. :)
It’s not just computationally formidable, it’s computationally impossible. :-)
My understanding was that it’s perfectly possible to solve the Halting Problem for particular TMs.
For a TM of length n, you begin running it and either it terminates at some point, or it runs for BusyBeaver(n)+1 steps, in which case you know it doesn’t ever halt.
So as long as you have a fixed length n that all possible TMs fit into and know the Busy Beaver for n, you can in fact use the universal prior. (Busy Beavers are uncomputable in general, though we know a fair number, but this is an easy condition to weaken down to something practical like high probability of being the Busy Beaver.)
The fact that BB(n) is uncomputable means that your procedure doesn’t, in fact, work. No?
You don’t, in any case, have a fixed n. (Unless perhaps you are only interested in modelling Turing machines that fit in the universe—but in that case, wouldn’t you also want your own computations to fit in the universe?)
Finding BB(n) for all n is uncomputable. Finding BB(n) where n=10, say, is not uncomputable.
Yeah, this is an interesting point. My thinking is that yes, we do only care about modeling Turing machines that fit in the universe, since by definition those are the only Turing machines whose output we will ever observe. We might also have to bite the bullet and say we only care about predicting Turing machines which are small enough that there is room left in the universe for us; we don’t want to perfectly model the entire universe, just small parts.
If it’s uncomputable in general then it’s uncomputable for some particular n. I expect that in fact it’s uncomputable for any n above some rather small value. (In particular, once n is large enough for there to be n-state Turing machines that unprovably don’t always halt.)
Well, if you only care about predicting smallish Turing machines, then for the same reason prediction is only useful if it can be done with smallish machines, in reasonable time. And even for very small n this isn’t true for calculating BB(n).
Just out of my own curiosity, I checked Wikipedia on Busy Beaver numbers: BB(10) > 3^^^3.
While the article admits that every finite sequence (such as n = 1...10) is computable, I think it is safe to call it intractable right now.
Hey, don’t laugh. That’s useful to know—now you know any Turing Machine less than 10 is bluffing when it threatens 3^^^^3 specks of sand in eyes.
Off the cuff, couldn’t it actually accomplish that if it just doesn’t terminate?
If it doesn’t terminate, then I think its threat would be an infinite number of sand specks, wouldn’t it?
How did you conclude that from a lower bound?
I don’t think the lower-bound will turn out to be all that off, and I tossed in another ^ as insurance. Jests don’t have to be air-tight.
I don’t think so, provided you limit yourself to Turing machines constructed of at most a universe’s worth of energy and have unlimited time for the computation. However, that said, invoking such a heavy-duty prior seems to me a bit of a cop-out; you might almost as well invoke “My prior is what Omega told me”. For choosing between Bayesian and frequentist approaches regarding actual problems in the real brain space available to humans, I prefer priors that can be computed in something like a human lifetime. To show that a Bayesian with a prior that would take multiple billion years to compute can do better than a frequentist is not really a very strong result.