I don’t understand the objection to the universal prior. Definition and computation are not the same thing. Yes, definition is subject to Berry’s Paradox if you don’t differentiate between definition, meta-definition, etc; but computation is not. In particular, what you list as “a short description” is only computable if P is, which a univeral prior won’t be. (I would think the non-computability in itself would be more of a problem!)
I may be repeating what Vladimir said here, but it seems to me your objection is basically “Oh shit! We can diagonalize!” (Which if we then collapse the levels can get us a Berry paradox, and others...)
So, yes, it follows that any system of description we can think of, there’s some potential truth its corresponding “universal prior” (question—do those exist in general?) won’t be able to infer. But the fact that this applies to any such system means we can’t use it as a criterion to decide between them. At some point we have to just stop and say, “No, you are not allowed to refer to this concept in formulating descriptions.” Maybe computability isn’t the best one, but you don’t seem to have actually given evidence that would support any other such system over it.
The thing I got out of it was that human brain processes appear to be able to do something (assign a nonzero probability to a non-computable universe) that our current formalization of general induction cannot do and we can’t really explain why this is.
Let me distill down what timtyler and Dre have written into a concisely stated question:
Premise 1: a human’s brain implements a computable algorithm Premise 2: a human can update on evidence and become convinced that a halting oracle exists I’m not sure if it is true that: when exposed to evidence from an actually existing halting oracle, the posterior probability of an algorithm implementing the predictions of a Premise 2-type human will exceed that of an algorithm that assigns zero prior probability to an uncomputable universe but if so, then—Conclusion: an agent with a Solomonoff prior can become convinced that the universe contains a halting oracle.
The last step doesn’t look valid to me. After updating on the evidence, you have a human who thinks they’ve seen a halting oracle, and a Solomonoff agent who thinks he’s seen a highly improbable event that doesn’t involve a halting oracle. The fact that the human assigns a higher probability to the observations is unconvincing, because he could’ve just been extraordinarily lucky.
Also, there are entities that are impossible to distinguish from halting oracles using all the computational resources in the universe, which are not actually halting oracles. For example, a “can be proven to halt using a proof shorter than 3^^^3 bits” oracle has nonzero probability under the Solomonoff prior.
Also, there are entities that are impossible to distinguish from halting oracles using all the computational resources in the universe, which are not actually halting oracles. For example, a “can be proven to halt using a proof shorter than 3^^^3 bits” oracle has nonzero probability under the Solomonoff prior.
“proof shorter than 3^^^3 bits” means “proof shorter than 3^^^3 bits in some formal system S”, right? Then I can write a program P that iterates through all possible proofs in S of length < 3^^^3 bits, and create a list of all TMs provable to terminate in less than 3^^^3 bits in S. Then P checks to see if P itself is contained in this list. If so, it goes into an infinite loop, otherwise it halts.
Now we know that if S is sound, then P halts AND can’t be proven to halt using a proof shorter than 3^^^3 bits in S. What happens if we feed this P to your impostor oracle?
That works if you can guess S, or some S’ that is more powerful than S. But might there be a formal system that couldn’t be guessed this way? My first thought was to construct a parameterized system somehow, S(x) where S(x) can prove that S(y) halts when a trick like this is used; but that can be defeated by simply iterating over systems, if you figure out the parameterization. But suppose you tried a bunch of formal logics this way, and the oracle passed them all; how could you ever be sure you hadn’t missed one? What about a proof system plus a tricky corner case detection heuristic that happens to cover all your programs?
Re: “I can write a program P that iterates through all possible proofs in S of length < 3^^^3 bits, and create a list of all TMs provable to terminate in less than 3^^^3 bits in S.”
An unspecified halting analysis on 2^3^^^3 programs?!?
Not within the universe’s expected lifespan, you can’t!
Apart from that, this looks rather like an intractable rearrangement of:
The conclusion follows (I think) because the Solomonoff agent is computing the posterior probability of all algorithms, including the one that implements the same computation the human implements. So after updating, the Solomonoff agent’s posterior probability for that algorithm should be higher than that of any other algorithm, and it draws the same conclusion the human does.
Also, there are entities that are impossible to distinguish from halting oracles using all the computational resources in the universe, which are not actually halting oracles.
Given this, then contra Wei Dai, I don’t know how any human attempting to internally implement Bayesian inference could possibly become convinced that a halting oracle exists.
The conclusion follows (I think) because the Solomonoff agent is computing the posterior probability of all algorithms, including the one that implements the same computation the human implements. So after updating, the Solomonoff agent’s posterior probability for that algorithm should be higher than that of any other algorithm, and it draws the same conclusion the human does.
You lost a level of indirection in there; computing the output of an algorithm does not mean believing that the output of that algorithm is true or even plausible. So the agent will correctly predict what the human will say, and believe that the human is mistaken.
The level of indirection isn’t necessary: the Solomonoff agent’s distribution is a weighted mixture of the outputs of all possible Turing machines, weighted according to the posterior probability of that Turing machine being the one that is generating the observations. Any Turing machine that predicts that the putative halting oracle gets one wrong on a particular trial gets downweighted to zero when that fails to occur.
The conclusion follows (I think) because the Solomonoff agent is computing the posterior probability of all algorithms, including the one that implements the same computation the human implements. So after updating, the Solomonoff agent’s posterior probability for that algorithm should be higher than that of any other algorithm, and it draws the same conclusion the human does.
That looks like the same position that Eliezer took, and I think I already refuted it. Let me know if you’ve read the one-logic thread and found my argument wrong or unconvincing.
The idea is that universal prior is really about observation-predicting algorithms that agents run, and not about prediction of what will happen in the world. So, for any agent that runs a given anticipation-defining algorithm and rewards/punishes the universal prior-based agent according to it, we have an anticipation-computing program that will obtain higher and higher probability in the universal prior-based agent.
This by the way again highlights the distinction between what will actually happen, and what a person anticipates—predictions are about capturing the concept of anticipation, an aspect of how people think, and are not about what in fact can happen.
You objection to the universal prior is: “what if Occam’s razor breaks”?!?
Engineers would not normally be concerned with such things. Occam’s razor has been fine for hundreds of years. If it ever breaks, we will update our probability expectations with whatever the new distribution is. Surely that is no big deal.
Original priors are not too important, anway. As soon as an agent is born, it is flooded with information from the world about the actual frequencies of things—and its original priors are quickly washed away and replaced with experience. If their experiences involve encounting the uncomputable, agents will simply update accordingly.
So: this topic seems like angels and pinheads—at least to me.
I don’t understand the objection to the universal prior. Definition and computation are not the same thing. Yes, definition is subject to Berry’s Paradox if you don’t differentiate between definition, meta-definition, etc; but computation is not. In particular, what you list as “a short description” is only computable if P is, which a univeral prior won’t be. (I would think the non-computability in itself would be more of a problem!)
You probably have to read is induction unformalizable? before Berry’s Paradox and universal distribution. (There’s a link to the former in the latter, but I’m guessing you skipped it.)
I may be repeating what Vladimir said here, but it seems to me your objection is basically “Oh shit! We can diagonalize!” (Which if we then collapse the levels can get us a Berry paradox, and others...)
So, yes, it follows that any system of description we can think of, there’s some potential truth its corresponding “universal prior” (question—do those exist in general?) won’t be able to infer. But the fact that this applies to any such system means we can’t use it as a criterion to decide between them. At some point we have to just stop and say, “No, you are not allowed to refer to this concept in formulating descriptions.” Maybe computability isn’t the best one, but you don’t seem to have actually given evidence that would support any other such system over it.
Or am I just missing something big here?
The thing I got out of it was that human brain processes appear to be able to do something (assign a nonzero probability to a non-computable universe) that our current formalization of general induction cannot do and we can’t really explain why this is.
I would also like to see this question addressed.
Let me distill down what timtyler and Dre have written into a concisely stated question:
Premise 1: a human’s brain implements a computable algorithm
Premise 2: a human can update on evidence and become convinced that a halting oracle exists
I’m not sure if it is true that: when exposed to evidence from an actually existing halting oracle, the posterior probability of an algorithm implementing the predictions of a Premise 2-type human will exceed that of an algorithm that assigns zero prior probability to an uncomputable universe
but if so, then—Conclusion: an agent with a Solomonoff prior can become convinced that the universe contains a halting oracle.
And my question: did I do a stupid?
The last step doesn’t look valid to me. After updating on the evidence, you have a human who thinks they’ve seen a halting oracle, and a Solomonoff agent who thinks he’s seen a highly improbable event that doesn’t involve a halting oracle. The fact that the human assigns a higher probability to the observations is unconvincing, because he could’ve just been extraordinarily lucky.
Also, there are entities that are impossible to distinguish from halting oracles using all the computational resources in the universe, which are not actually halting oracles. For example, a “can be proven to halt using a proof shorter than 3^^^3 bits” oracle has nonzero probability under the Solomonoff prior.
“proof shorter than 3^^^3 bits” means “proof shorter than 3^^^3 bits in some formal system S”, right? Then I can write a program P that iterates through all possible proofs in S of length < 3^^^3 bits, and create a list of all TMs provable to terminate in less than 3^^^3 bits in S. Then P checks to see if P itself is contained in this list. If so, it goes into an infinite loop, otherwise it halts.
Now we know that if S is sound, then P halts AND can’t be proven to halt using a proof shorter than 3^^^3 bits in S. What happens if we feed this P to your impostor oracle?
That works if you can guess S, or some S’ that is more powerful than S. But might there be a formal system that couldn’t be guessed this way? My first thought was to construct a parameterized system somehow, S(x) where S(x) can prove that S(y) halts when a trick like this is used; but that can be defeated by simply iterating over systems, if you figure out the parameterization. But suppose you tried a bunch of formal logics this way, and the oracle passed them all; how could you ever be sure you hadn’t missed one? What about a proof system plus a tricky corner case detection heuristic that happens to cover all your programs?
Re: “I can write a program P that iterates through all possible proofs in S of length < 3^^^3 bits, and create a list of all TMs provable to terminate in less than 3^^^3 bits in S.”
An unspecified halting analysis on 2^3^^^3 programs?!?
Not within the universe’s expected lifespan, you can’t!
Apart from that, this looks rather like an intractable rearrangement of:
http://en.wikipedia.org/wiki/Berry_paradox
The conclusion follows (I think) because the Solomonoff agent is computing the posterior probability of all algorithms, including the one that implements the same computation the human implements. So after updating, the Solomonoff agent’s posterior probability for that algorithm should be higher than that of any other algorithm, and it draws the same conclusion the human does.
Given this, then contra Wei Dai, I don’t know how any human attempting to internally implement Bayesian inference could possibly become convinced that a halting oracle exists.
You lost a level of indirection in there; computing the output of an algorithm does not mean believing that the output of that algorithm is true or even plausible. So the agent will correctly predict what the human will say, and believe that the human is mistaken.
The level of indirection isn’t necessary: the Solomonoff agent’s distribution is a weighted mixture of the outputs of all possible Turing machines, weighted according to the posterior probability of that Turing machine being the one that is generating the observations. Any Turing machine that predicts that the putative halting oracle gets one wrong on a particular trial gets downweighted to zero when that fails to occur.
That looks like the same position that Eliezer took, and I think I already refuted it. Let me know if you’ve read the one-logic thread and found my argument wrong or unconvincing.
The idea is that universal prior is really about observation-predicting algorithms that agents run, and not about prediction of what will happen in the world. So, for any agent that runs a given anticipation-defining algorithm and rewards/punishes the universal prior-based agent according to it, we have an anticipation-computing program that will obtain higher and higher probability in the universal prior-based agent.
This by the way again highlights the distinction between what will actually happen, and what a person anticipates—predictions are about capturing the concept of anticipation, an aspect of how people think, and are not about what in fact can happen.
You objection to the universal prior is: “what if Occam’s razor breaks”?!?
Engineers would not normally be concerned with such things. Occam’s razor has been fine for hundreds of years. If it ever breaks, we will update our probability expectations with whatever the new distribution is. Surely that is no big deal.
Original priors are not too important, anway. As soon as an agent is born, it is flooded with information from the world about the actual frequencies of things—and its original priors are quickly washed away and replaced with experience. If their experiences involve encounting the uncomputable, agents will simply update accordingly.
So: this topic seems like angels and pinheads—at least to me.