The relevant passage of the essay (p. 65) goes into more detail than the paraphrase you quoted, but the short answer is: how does the superintelligence know it should assume the uniform distribution, and not some other distribution? For example, suppose someone tips it off about a third Earth, C, which is “close enough” to Earths A and B even if not microscopically identical, and in which you made the same decision as in B. Therefore, this person says, the probabilities should be adjusted to (1/3,2/3) rather than (1/2,1/2). It’s not obvious whether the person is right—is Earth C really close enough to A and B?---but the superintelligence decides to give the claim some nonzero credence. Then boom, its prior is no longer uniform. It might still be close, but if there are thousands of freebits, then the distance from uniformity will quickly get amplified to almost 1.
Your prescription corresponds to E. T. Jaynes’s “MaxEnt principle,” which basically says to assume a uniform (or more generally, maximum-entropy) prior over any degrees of freedom that you don’t understand. But the conceptual issues with MaxEnt are well-known: the uniform prior over what, exactly? How do you even parameterize “the degrees of freedom that you don’t understand,” in order to assume a uniform prior over them? (You don’t want to end up like the guy interviewed on Jon Stewart, who earnestly explained that, since the Large Hadron Collider might destroy the earth and might not destroy it, the only rational inference was that it had a 50% chance of doing so. :-) )
To clarify: I don’t deny the enormous value of MaxEnt and other Bayesian-prior-choosing heuristics in countless statistical applications. Indeed, if you forced me at gunpoint to bet on something about which I had Knightian uncertainty, then I too would want to use Bayesian methods, making judicious use of those heuristics! But applied statisticians are forced to use tricks like MaxEnt, precisely because they lack probabilistic models for the systems they’re studying that are anywhere near as complete as (let’s say) the quantum model of the hydrogen atom. If you believe there are any systems in nature for which, given the limits of our observations, our probabilistic models can never achieve hydrogen-atom-like completeness (even in principle)---so that we’ll always be forced to use tricks like MaxEnt—then you believe in freebits. There’s nothing more to the notion than that.
how does the superintelligence know it should assume the uniform distribution, and not some other distribution?
Symmetry arguments? And since our superintelligence understands the working of your brain minus this qubit, the symmetry isn’t between choices A and B, but rather between the points on the Bloch sphere of the qubit. Learning that in some microscopically independent trial a qubit had turned out in such a way that you chose B doesn’t give the superintelligence any information about your qubit, and so wouldn’t change its prediction.
A less-super intelligence, who was uncertain about the function (your brain) that mapped qubits onto decisions, would update in favor of the functions that produced B—the degree to which this mattered would depend on its probability distribution over functions.
I don’t deny the enormous value of MaxEnt and other Bayesian-prior-choosing heuristics in countless statistical applications. Indeed, if you forced me at gunpoint to bet on something about which I had Knightian uncertainty, then I too would want to use Bayesian methods, making judicious use of those heuristics!
This still seems weird, though I believe in freebits by your requirement. Why would you want to use Bayesian methods if no guess (in the form of a probability, to be scored according to some rule that rewards good guesses) is better than another on Knightian problems? And if some guess is better than another, why not use the best guess? That’s what using probability is all about—if you didn’t have incomplete information, you wouldn’t need to guess at all.
Well, I can try to make my best guess if forced to—using symmetry arguments or any other heuristic at my disposal—but my best guess might differ from some other, equally-rational person’s best guess. What I mean by a probabilistic system’s being “mechanistic” is that the probabilities can be calculated in such a way that no two rational people will disagree about them (as with, say, a radioactive decay time, or the least significant digit of next week’s Dow Jones average).
Also, the point of my “Earth C” example was that symmetry arguments can only be used once we know the reference class of things to symmetrize over—but which reference class to choose is precisely the sort of thing about which rational people can disagree, with no mathematical or empirical way to resolve the disagreement. (And moreover, where it’s not even obvious that there is a “true, pre-existing answer” out there in the world, or what it would mean for something to count as such an answer.)
Hm. So then do we have two types of problems you’re claiming Bayesian inference isn’t good enough for? One is problems involving freebits, and another is problems involving disagreements about reference classes?
The reason I don’t think “Earth C” had an impact on the perfect-prediction-except-for-isolated-qubits case is because I’d turned the reference class problem into an information content problem, which actually does have a correct solution.
where it’s not even obvious that there is a “true, pre-existing answer” out there in the world
I think this is the normal and acceptable state of affairs for all probability assignments.
How would one test experimentally whether the uncertainty in question is Knightian? Assuming we do our best to make many repeatable runs of some experiment, what set of outcomes would point to the MaxEnt (or any) prior being inadequate?
shminux: I don’t know any way, even in principle, to prove that uncertainty is Knightian. (How do you decisively refute someone who claims that if only we had a better theory, we could calculate the probabilities?) Though even here, there’s an interesting caveat. Namely, I also would have thought as a teenager that there could be no way, even in principle, to “prove” something is “truly probabilistic,” rather than deterministic but with complicated hidden parameters. But that was before I learned the Bell/CHSH theorem, which does pretty much exactly that (if you grant some mild locality assumptions)! So it’s at least logically possible that some future physical theory could demand Knightian uncertainty in order to make internal sense, in much the same way that quantum mechanics demands probabilistic uncertainty.
But setting aside that speculative possibility, there’s a much more important point in practice: namely, it’s much easier to rule out that a given source of uncertainty is Knightian, or at least to place upper bounds on how much Knightian uncertainty it can have. To do so, you “merely” have to give a model for the system so detailed that, by using it, you can:
(1) calculate the probability of any event you want, to any desired accuracy,
(2) demonstrate, using repeated tests, that your probabilities are well-calibrated (e.g., of the things you say will happen roughly 60% of the time, roughly 60% of them indeed happen, and moreover the subset of those things that happen passes all the standard statistical tests for not having any further structure), and
(3) crucially, provide evidence that your probabilities don’t merely reflect epistemic ignorance. In practice, this would almost certainly mean providing the causal pathways by which the probabilities can be traced down to the quantum level.
Admittedly, (1)-(3) sound like a tall order! But I’d say that they’ve already been done, more or less, for all sorts of complicated multi-particle quantum systems (in chemistry, condensed-matter physics, etc.): we can calculate the probabilities, compare them against observation, and trace the origin of the probabilities to the Born rule.
Of course, if you have a large ensemble of identical copies of your system (or things you regard as identical copies), then that makes validating your probabilistic model a lot more straightforward, for then you can replace step (2) by direct experimental estimation of the probabilities. But in the above, I was careful never to assume that we had lots of identical copies—since if the freebit picture were accepted, then in many cases of interest to us we wouldn’t!
How do you decisively refute someone who claims that if only we had a better theory, we could calculate the probabilities?
This seems like too strong a statement. After all, if one knows exactly the initial quantum state at the Big Bang, then one also knows all the freebits. I believe that what you are after is not proving that no theory would allow us to calculate the probabilities, but rather that our current best theory does not. In your example, that knowing any amount of macrofacts from the past still would not allow us to calculate the probabilities of some future macrofacts. My question was about a potential experimental signature of such a situation.
I suspect that this would be a rather worthwhile question to seriously think about, potentially leading to Bell-style insights. I wonder what could be a simple toy model of a situation like that: a general theory G, a partial theory P and a set of experimental data E from which one can conclude that there is no well calibrated set of probabilities P->p(E) derivable from P only, even though there is one from G, G->p(E). Hmm, I might be letting myself to get carried away a bit.
The relevant passage of the essay (p. 65) goes into more detail than the paraphrase you quoted, but the short answer is: how does the superintelligence know it should assume the uniform distribution, and not some other distribution? For example, suppose someone tips it off about a third Earth, C, which is “close enough” to Earths A and B even if not microscopically identical, and in which you made the same decision as in B. Therefore, this person says, the probabilities should be adjusted to (1/3,2/3) rather than (1/2,1/2). It’s not obvious whether the person is right—is Earth C really close enough to A and B?---but the superintelligence decides to give the claim some nonzero credence. Then boom, its prior is no longer uniform. It might still be close, but if there are thousands of freebits, then the distance from uniformity will quickly get amplified to almost 1.
Your prescription corresponds to E. T. Jaynes’s “MaxEnt principle,” which basically says to assume a uniform (or more generally, maximum-entropy) prior over any degrees of freedom that you don’t understand. But the conceptual issues with MaxEnt are well-known: the uniform prior over what, exactly? How do you even parameterize “the degrees of freedom that you don’t understand,” in order to assume a uniform prior over them? (You don’t want to end up like the guy interviewed on Jon Stewart, who earnestly explained that, since the Large Hadron Collider might destroy the earth and might not destroy it, the only rational inference was that it had a 50% chance of doing so. :-) )
To clarify: I don’t deny the enormous value of MaxEnt and other Bayesian-prior-choosing heuristics in countless statistical applications. Indeed, if you forced me at gunpoint to bet on something about which I had Knightian uncertainty, then I too would want to use Bayesian methods, making judicious use of those heuristics! But applied statisticians are forced to use tricks like MaxEnt, precisely because they lack probabilistic models for the systems they’re studying that are anywhere near as complete as (let’s say) the quantum model of the hydrogen atom. If you believe there are any systems in nature for which, given the limits of our observations, our probabilistic models can never achieve hydrogen-atom-like completeness (even in principle)---so that we’ll always be forced to use tricks like MaxEnt—then you believe in freebits. There’s nothing more to the notion than that.
Symmetry arguments? And since our superintelligence understands the working of your brain minus this qubit, the symmetry isn’t between choices A and B, but rather between the points on the Bloch sphere of the qubit. Learning that in some microscopically independent trial a qubit had turned out in such a way that you chose B doesn’t give the superintelligence any information about your qubit, and so wouldn’t change its prediction.
A less-super intelligence, who was uncertain about the function (your brain) that mapped qubits onto decisions, would update in favor of the functions that produced B—the degree to which this mattered would depend on its probability distribution over functions.
This still seems weird, though I believe in freebits by your requirement. Why would you want to use Bayesian methods if no guess (in the form of a probability, to be scored according to some rule that rewards good guesses) is better than another on Knightian problems? And if some guess is better than another, why not use the best guess? That’s what using probability is all about—if you didn’t have incomplete information, you wouldn’t need to guess at all.
Well, I can try to make my best guess if forced to—using symmetry arguments or any other heuristic at my disposal—but my best guess might differ from some other, equally-rational person’s best guess. What I mean by a probabilistic system’s being “mechanistic” is that the probabilities can be calculated in such a way that no two rational people will disagree about them (as with, say, a radioactive decay time, or the least significant digit of next week’s Dow Jones average).
Also, the point of my “Earth C” example was that symmetry arguments can only be used once we know the reference class of things to symmetrize over—but which reference class to choose is precisely the sort of thing about which rational people can disagree, with no mathematical or empirical way to resolve the disagreement. (And moreover, where it’s not even obvious that there is a “true, pre-existing answer” out there in the world, or what it would mean for something to count as such an answer.)
Hm. So then do we have two types of problems you’re claiming Bayesian inference isn’t good enough for? One is problems involving freebits, and another is problems involving disagreements about reference classes?
The reason I don’t think “Earth C” had an impact on the perfect-prediction-except-for-isolated-qubits case is because I’d turned the reference class problem into an information content problem, which actually does have a correct solution.
I think this is the normal and acceptable state of affairs for all probability assignments.
How would one test experimentally whether the uncertainty in question is Knightian? Assuming we do our best to make many repeatable runs of some experiment, what set of outcomes would point to the MaxEnt (or any) prior being inadequate?
shminux: I don’t know any way, even in principle, to prove that uncertainty is Knightian. (How do you decisively refute someone who claims that if only we had a better theory, we could calculate the probabilities?) Though even here, there’s an interesting caveat. Namely, I also would have thought as a teenager that there could be no way, even in principle, to “prove” something is “truly probabilistic,” rather than deterministic but with complicated hidden parameters. But that was before I learned the Bell/CHSH theorem, which does pretty much exactly that (if you grant some mild locality assumptions)! So it’s at least logically possible that some future physical theory could demand Knightian uncertainty in order to make internal sense, in much the same way that quantum mechanics demands probabilistic uncertainty.
But setting aside that speculative possibility, there’s a much more important point in practice: namely, it’s much easier to rule out that a given source of uncertainty is Knightian, or at least to place upper bounds on how much Knightian uncertainty it can have. To do so, you “merely” have to give a model for the system so detailed that, by using it, you can:
(1) calculate the probability of any event you want, to any desired accuracy,
(2) demonstrate, using repeated tests, that your probabilities are well-calibrated (e.g., of the things you say will happen roughly 60% of the time, roughly 60% of them indeed happen, and moreover the subset of those things that happen passes all the standard statistical tests for not having any further structure), and
(3) crucially, provide evidence that your probabilities don’t merely reflect epistemic ignorance. In practice, this would almost certainly mean providing the causal pathways by which the probabilities can be traced down to the quantum level.
Admittedly, (1)-(3) sound like a tall order! But I’d say that they’ve already been done, more or less, for all sorts of complicated multi-particle quantum systems (in chemistry, condensed-matter physics, etc.): we can calculate the probabilities, compare them against observation, and trace the origin of the probabilities to the Born rule.
Of course, if you have a large ensemble of identical copies of your system (or things you regard as identical copies), then that makes validating your probabilistic model a lot more straightforward, for then you can replace step (2) by direct experimental estimation of the probabilities. But in the above, I was careful never to assume that we had lots of identical copies—since if the freebit picture were accepted, then in many cases of interest to us we wouldn’t!
This seems like too strong a statement. After all, if one knows exactly the initial quantum state at the Big Bang, then one also knows all the freebits. I believe that what you are after is not proving that no theory would allow us to calculate the probabilities, but rather that our current best theory does not. In your example, that knowing any amount of macrofacts from the past still would not allow us to calculate the probabilities of some future macrofacts. My question was about a potential experimental signature of such a situation.
I suspect that this would be a rather worthwhile question to seriously think about, potentially leading to Bell-style insights. I wonder what could be a simple toy model of a situation like that: a general theory G, a partial theory P and a set of experimental data E from which one can conclude that there is no well calibrated set of probabilities P->p(E) derivable from P only, even though there is one from G, G->p(E). Hmm, I might be letting myself to get carried away a bit.