Suppose we’re using Laplace’s Rule of Succession on a coin. On the zeroth round before we have seen any evidence, we assign probability 0.5 to the first coinflip coming up heads. We also assign marginal probability 0.5 to the second flip coming up heads, the third flip coming up heads, and so on. What distinguishes the Laplace epistemic state from the ‘certainty of a fair coin’ epistemic state is that they represent different probability distributions over sequences of coinflips.
Since some probability distributions over events are correlated, we must represent our states of knowledge by assigning probabilities to sequences or sets of events, and our states of knowledge cannot be represented by stating marginal probabilities for all events independently.
We could also try to summarize some features of such epistemic states by talking about the instability of estimates—the degree to which they are easily updated by knowledge of other events—though of course this will be a derived feature of the probability distribution, rather than an ontologically extra feature of probability.
I reject that this is a good reason for probability theorists to panic.
On the meta level I remark that panic represents a failure of reductionist effort; that is, it would be possible to reduce things to simple probabilities by putting in an effort, but there is a temptation to not put in this effort and instead complicate our view of probability. After seeing this reduction work a few dozen times, however, one begins to acquire (by Laplace’s Rule of Succession) some degree of confidence that it can be carried out on the next occasion as well, even if the manner of doing so is not immediately obvious, and a hasty assertion of a fake reduction would not be helpful.
We could also try to summarize some features of such epistemic states by talking about the instability of estimates—the degree to which they are easily updated by knowledge of other events
Yes, this is Jaynes’ A_p approach.
this will be a derived feature of the probability distribution, rather than an ontologically extra feature of probability.
I’m not sure I follow this. There is no prior distribution for the per-coin payout probabilities that can accurately reflect all our knowledge.
I reject that this is a good reason for probability theorists to panic.
Yes, it’s clear from comments that my OP was somewhat misleading as to its purpose. Overall, the sequence intends to discuss cases of uncertainty in which probability theory is the wrong tool for the job, and what to do instead.
However, this particular article intended only to introduce the idea that one’s confidence in a probability estimate is independent from that estimate, and to develop the A_p (meta-probability) approach to expressing that confidence.
I’m not sure I follow this. There is no prior distribution for the per-coin payout probabilities that can accurately reflect all our knowledge.
Are we talking about the Laplace vs. fair coins? Are you claiming there’s no prior distribution over sequences which reflects our knowledge? If so I think you are wrong as a matter of math.
Are you claiming there’s no prior distribution over sequences which reflects our knowledge?
No. Well, not so long as we’re allowed to take our own actions into account!
I want to emphasize—since many commenters seem to have mistaken me on this—that there’s an obvious, correct solution to this problem (which I made explicit in the OP). I deliberately made the problem as simple as possible in order to present the A_p framework clearly.
Are you claiming there’s no prior distribution over sequences which reflects our knowledge?
No. Well, not so long as we’re allowed to take our own actions into account!
Heh! Yes, traditional causal models have structure beyond what is present in the corresponding probability distribution over those models, though this has to do with computing counterfactuals rather than meta-probability or estimate instability. Work continues at MIRI decision theory workshops on the search for ways to turn some of this back into probability, but yes, in my world causal models are things we assign probabilities to, over and beyond probabilities we assign to joint collections of events. They are still models of reality to which a probability is assigned, though. (See Judea Pearl’s “Why I Am Only A Half-Bayesian”.)
I don’t really understand what “being Bayesian about causal models” means. What makes the most sense (e.g. what people typicalliy do) is:
(a) “be Bayesian about statistical models”, and
(b) Use additional assumptions to interpret the output of (a) causally.
(a) makes sense because I understand how evidence help me select among sets of statistical alternatives.
(b) also makes sense, but then no one will accept your answer without actually verifying the causal model by experiment—because your assumptions linking the statistical model to a causal one may not be true. And this game of verifying these assumptions doesn’t seem like a Bayesian kind of game at all.
I don’t know what it means to use Bayes theorem to select among causal models directly.
It means that you figure out which causal models look more or less like what you observed.
More generally: There’s a language of causal models which, we think, allows us to describe the actual universe, and many other universes besides. Some of these models are simpler than others. Any given sequence of experiences has some probability of being encountered in a given causal universe.
Suppose we’re using Laplace’s Rule of Succession on a coin. On the zeroth round before we have seen any evidence, we assign probability 0.5 to the first coinflip coming up heads. We also assign marginal probability 0.5 to the second flip coming up heads, the third flip coming up heads, and so on. What distinguishes the Laplace epistemic state from the ‘certainty of a fair coin’ epistemic state is that they represent different probability distributions over sequences of coinflips.
Since some probability distributions over events are correlated, we must represent our states of knowledge by assigning probabilities to sequences or sets of events, and our states of knowledge cannot be represented by stating marginal probabilities for all events independently.
We could also try to summarize some features of such epistemic states by talking about the instability of estimates—the degree to which they are easily updated by knowledge of other events—though of course this will be a derived feature of the probability distribution, rather than an ontologically extra feature of probability.
I reject that this is a good reason for probability theorists to panic.
On the meta level I remark that panic represents a failure of reductionist effort; that is, it would be possible to reduce things to simple probabilities by putting in an effort, but there is a temptation to not put in this effort and instead complicate our view of probability. After seeing this reduction work a few dozen times, however, one begins to acquire (by Laplace’s Rule of Succession) some degree of confidence that it can be carried out on the next occasion as well, even if the manner of doing so is not immediately obvious, and a hasty assertion of a fake reduction would not be helpful.
Yes, this is Jaynes’ A_p approach.
I’m not sure I follow this. There is no prior distribution for the per-coin payout probabilities that can accurately reflect all our knowledge.
Yes, it’s clear from comments that my OP was somewhat misleading as to its purpose. Overall, the sequence intends to discuss cases of uncertainty in which probability theory is the wrong tool for the job, and what to do instead.
However, this particular article intended only to introduce the idea that one’s confidence in a probability estimate is independent from that estimate, and to develop the A_p (meta-probability) approach to expressing that confidence.
Are we talking about the Laplace vs. fair coins? Are you claiming there’s no prior distribution over sequences which reflects our knowledge? If so I think you are wrong as a matter of math.
No. Well, not so long as we’re allowed to take our own actions into account!
I want to emphasize—since many commenters seem to have mistaken me on this—that there’s an obvious, correct solution to this problem (which I made explicit in the OP). I deliberately made the problem as simple as possible in order to present the A_p framework clearly.
Not sure what you are asking here, sorry...
Heh! Yes, traditional causal models have structure beyond what is present in the corresponding probability distribution over those models, though this has to do with computing counterfactuals rather than meta-probability or estimate instability. Work continues at MIRI decision theory workshops on the search for ways to turn some of this back into probability, but yes, in my world causal models are things we assign probabilities to, over and beyond probabilities we assign to joint collections of events. They are still models of reality to which a probability is assigned, though. (See Judea Pearl’s “Why I Am Only A Half-Bayesian”.)
I don’t really understand what “being Bayesian about causal models” means. What makes the most sense (e.g. what people typicalliy do) is:
(a) “be Bayesian about statistical models”, and
(b) Use additional assumptions to interpret the output of (a) causally.
(a) makes sense because I understand how evidence help me select among sets of statistical alternatives.
(b) also makes sense, but then no one will accept your answer without actually verifying the causal model by experiment—because your assumptions linking the statistical model to a causal one may not be true. And this game of verifying these assumptions doesn’t seem like a Bayesian kind of game at all.
I don’t know what it means to use Bayes theorem to select among causal models directly.
It means that you figure out which causal models look more or less like what you observed.
More generally: There’s a language of causal models which, we think, allows us to describe the actual universe, and many other universes besides. Some of these models are simpler than others. Any given sequence of experiences has some probability of being encountered in a given causal universe.