“Of course it can be any sort of universal computer; why would we care whether it’s a Turing machine or some other sort?”
Well, different reference machines produce different prior distributions—so the distribution used matters initially, when the machine is new to the world.
“Your statement that taking a universal computer and generating the corresponding universal prior will get you “literally any distribution of priors you can imagine” is just false, especially as it will only get you uncomputable ones! ”
“Any distribution you can compute”, then—if you prefer to think that you can imagine the uncomputable.
“You have still done nothing to demonstrate this.”
Actually, I think I give up trying to explain. From my perspective you seem to have some kind of tangle around the word “universal”. “Universal” could usefully refer to “universal computation” or to a prior that covers “every hypothesis in the universe”.
There is also the “universal prior”—but I don’t think “universal” there has quite the same significance that you seem to think it does. There seems to be repeated miscommunication going on in this area.
It seems non-trivial to describe the class of priors that leads to “fairly rapid” belief convergence in an intelligent machine. Suffice to say, I think that class is large—and that the details of priors are relatively insignificant—provided there is not too much “faith”—or “near faith”. Part of the reason for that is that priors usually get rapidly overwritten by data. That data establishes its own subsequent prior distributions for all the sources you encounter—and for most of the ones that you don’t. If you don’t agree, fine—I won’t bang on about it further in an attempt to convince you.
Firstly, please use Markdown quotes for ease of reading? :-/
Well, different reference machines produce different prior distributions—so the distribution used matters initially, when the machine is new to the world.
Indeed, but I don’t think that’s really the property under discussion.
“Any distribution you can compute”, then—if you prefer to think that you can imagine the uncomputable.
....huh? Maybe you are misunderstanding the procedure in question here. We are not taking arbitrary computations that output distributions and using those distributions. That would get you arbitrary computable distributions. Rather, we are taking arbitrary universal computers/UTMs/Turing-complete programming languages/whatever you want to call them, and then generating a distribution as “probability of x is sum over 2^-length over all programs that output something beginning with x” (possibly normalized). I.e. we are taking a reference machine and generating the corresponding universal prior.
Not only will this not get you “any distribution you can compute”, it won’t get you any distributions you can compute at all. The resulting distribution is always uncomputable. (And hence, in particular, not practical, and presumably not “reasonable”, whatever that may mean.)
Am I mistaken in asserting that this is what was under discussion?
It seems non-trivial to describe the class of priors that leads to “fairly rapid” belief convergence in an intelligent machine. Suffice to say, I think that class is large—and that the details of priors are relatively insignificant—provided there is not too much “faith”—or “near faith”. Part of the reason for that is that priors usually get rapidly overwritten by data. That data establishes its own subsequent prior distributions for all the sources you encounter—and for most of the ones that you don’t. If you don’t agree, fine—I won’t bang on about it further in an attempt to convince you.
You don’t have to attempt to convince me, but do note that despite asserting it repeatedly you have, in fact, done zero to establish the truth of this assertion / validity of this intuition, which I have good reason to believe to be unlikely, as I described earlier.
FWIW, what I meant was that—by altering the reference machine, p() - for all bitstrings less than a zillion bits long—can be made into any set of probabilities you like—provided they don’t add up to more than 1, of course.
The reference machine defines the resulting probability distribution completely.
AH! So you are making a comment on the use of universal priors to approximate arbitrary finite priors (and hence presumably vice versa). That is interesting, though I’m not sure what it has to do with eventual convergence. You should have actually stated that at some point!
“Of course it can be any sort of universal computer; why would we care whether it’s a Turing machine or some other sort?”
Well, different reference machines produce different prior distributions—so the distribution used matters initially, when the machine is new to the world.
“Your statement that taking a universal computer and generating the corresponding universal prior will get you “literally any distribution of priors you can imagine” is just false, especially as it will only get you uncomputable ones! ”
“Any distribution you can compute”, then—if you prefer to think that you can imagine the uncomputable.
“You have still done nothing to demonstrate this.”
Actually, I think I give up trying to explain. From my perspective you seem to have some kind of tangle around the word “universal”. “Universal” could usefully refer to “universal computation” or to a prior that covers “every hypothesis in the universe”. There is also the “universal prior”—but I don’t think “universal” there has quite the same significance that you seem to think it does. There seems to be repeated miscommunication going on in this area.
It seems non-trivial to describe the class of priors that leads to “fairly rapid” belief convergence in an intelligent machine. Suffice to say, I think that class is large—and that the details of priors are relatively insignificant—provided there is not too much “faith”—or “near faith”. Part of the reason for that is that priors usually get rapidly overwritten by data. That data establishes its own subsequent prior distributions for all the sources you encounter—and for most of the ones that you don’t. If you don’t agree, fine—I won’t bang on about it further in an attempt to convince you.
Firstly, please use Markdown quotes for ease of reading? :-/
Indeed, but I don’t think that’s really the property under discussion.
....huh? Maybe you are misunderstanding the procedure in question here. We are not taking arbitrary computations that output distributions and using those distributions. That would get you arbitrary computable distributions. Rather, we are taking arbitrary universal computers/UTMs/Turing-complete programming languages/whatever you want to call them, and then generating a distribution as “probability of x is sum over 2^-length over all programs that output something beginning with x” (possibly normalized). I.e. we are taking a reference machine and generating the corresponding universal prior.
Not only will this not get you “any distribution you can compute”, it won’t get you any distributions you can compute at all. The resulting distribution is always uncomputable. (And hence, in particular, not practical, and presumably not “reasonable”, whatever that may mean.)
Am I mistaken in asserting that this is what was under discussion?
You don’t have to attempt to convince me, but do note that despite asserting it repeatedly you have, in fact, done zero to establish the truth of this assertion / validity of this intuition, which I have good reason to believe to be unlikely, as I described earlier.
FWIW, what I meant was that—by altering the reference machine, p() - for all bitstrings less than a zillion bits long—can be made into any set of probabilities you like—provided they don’t add up to more than 1, of course.
The reference machine defines the resulting probability distribution completely.
AH! So you are making a comment on the use of universal priors to approximate arbitrary finite priors (and hence presumably vice versa). That is interesting, though I’m not sure what it has to do with eventual convergence. You should have actually stated that at some point!