Although I probably agree with your point, the chosen formulation is weird. The uncertainty is hidden in the probability, “uncertain probabilities” is sort of pleonasm. I like this comment, especially
The frequency with which a coin comes up heads isn’t a probability, no matter how much it looks like one. This is what’s going wrong in the heads of people who say things like “The probability is either 1 or 0, but I don’t know which.”
Although I probably agree with your point, the chosen formulation is weird. The uncertainty is hidden in the probability, “uncertain probabilities” is sort of pleonasm.
I did spend some time thinking about exactly what this means after writing it. It seems to me there is a meaningful sense in which probabilities can be more or less uncertain and I haven’t seen it well dealt with by discussions of probability here. If I have a coin which I have run various tests on and convinced myself it is fair then I am fairly certain the probability of it coming up heads is 0.5. I think the probability of the Republicans gaining control of Congress in November is 0.7 but I am less certain about this probability. I think this uncertainty reflects some meaningful property of my state of knowledge.
I tentatively think that this sense of ‘certainty’ reflects something about the level of confidence I have in the models of the world from which these probabilities derive. It also possibly reflects something about my sense of what fraction of all the non-negligibly relevant information that exists I have actually used to reach my estimate. Another possible interpretation of this sense of certainty is a probability estimate for how likely I am to encounter information in the future which would significantly change my current probability estimate. A probability I am certain about is one I expect to be robust to the kinds of sensory input I think I might encounter in the future.
This sense of how certain or uncertain a probability is may have no place in a perfect Bayesian reasoner but I think it is meaningful information to consider as a human making decisions under uncertainty. In the context of the original comment, low probabilities are associated with rare events and as such are the kinds of thing we might expect to have a very incomplete model of or a very sparse sampling of relevant data for. They are probabilities which we might expect to easily double or halve in response to the acquisition of a relatively small amount of new sensory data.
Perhaps it’s as simple as how much you update when someone offers to make a bet with you. If you suspect your model is incomplete or you lack much of the relevant data then someone offering to make a bet with you will make you suspect they know something you don’t and so update your estimate significantly.
It seems to me there is a meaningful sense in which probabilities can be more or less uncertain
Here’s another example. Suppose you’re drawing balls from a large bin. You know the bin has red and white balls, but you don’t know how many there are of each.
After drawing two balls, you have one white and one red ball.
After drawing 100,000 balls, you have 50,000 white and 50,000 red balls.
In both cases you might assign a probability of .5 for drawing a white ball next, but it seems like in the n = 100,000 case you should be more certain of this probability than in the n = 2 case.
One could try to account for this by adding an extra criteria that specifies whether or not you expect your probability estimate to change. E.g. in the n = 10,000 case you’re .5 certain of drawing a white ball next, and .99 certain of this estimate not changing regardless of how many more trials you conduct. In the n = 2 case you would still be .5 certain of drawing a white ball next, but only .05 (or whatever your prior) certain of this being the probability you’ll eventually end up converging on.
This is the approach taken in Probabilistic Logic Networks, which uses ‘indefinite probabilities’ of the form <[L,U], b, k>. This stands roughly for “I assign a probability of b to the hypothesis that, after having observed k more pieces of evidence, the truth value I assign to S will lie in the interval [L, U]”.
Yes. I think this sense of how ‘certain’ I am about a probability probably corresponds to some larger scale property of a Bayesian network (some measure of how robust a particular probability is to new input data) but for humans using math to help with reasoning it might well be useful to have a more direct way of working with this concept.
This is also a problem I have thought about a bit. I plan to think about it more, organize my thoughts, and hopefully make a post about it soon, but in the meantime I’ll sketch my ideas. (It’s unfortunate that this comment appeared in a post that was so severely downvoted, as less people are likely to think about it now.)
There is no sense in which an absolute probability can be uncertain. Given our priors, and the data we have, Bayes’ rule can only give one answer.
However, there is a sense in which conditional probability can be uncertain. Since all probabilities in reality are conditional (at the very least, we have to condition on our thought process making any sense at all), it will be quite common in practice to feel uncertain about a probability, and to be well-justified in doing so.
Let me illustrate with the coin example. When I say that the next flip has a 50% chance of coming up heads, what I really mean is that the coin will come up heads in half of all universes that I can imagine (weighted by likelihood of occurrence) that are consistent with my observations so far.
However, we also have an estimate of another quantity, namely ‘the probability that the coin comes up heads’ (generically). I’m going to call this the weight of the coin since that is the colloquial term. When we say that we are 50% confident that the coin comes up heads (and that we have a high degree of confidence in our estimate), we really mean that we believe that the distribution over the weight of the coin is tightly concentrated about one-half. This will be the case after 10,000 flips, but not after 5 flips. (In fact after N heads and N tails, a weight of x has probability proportional to [x(1-x)] ^N.)
What is important to realize is that the statement ‘the coin will come up heads with probability 50%’ means ‘I believe that in half of all conceivable universes the coin will come up heads’, whereas ‘I am 90% confident that the coin will come up heads with probability 50%’ means something more along the lines of ‘I believe that in 90% of all conceivable universes my models predict a 50% chance of heads’. But there is also the difference that in the second statement, the ’90% of all conceivable universes’ only actually specifies them up to the extent that our models need in order to take over.
I think that this is similar to what humans do when they express confidence in a probability. However, there is an important difference, as in the previous case my ‘confidence in a probability’ corresponded to some hidden parameter that dictated the results of the coin under repeated trials. The hidden parameter in most real-world situations is far less clear, and we also don’t usually get to see repeated trials (I don’t think this should matter, but unfortunately my intuition is frequentist).
This is also a problem I have thought about a bit. I plan to think about it more, organize my thoughts, and hopefully make a post about it soon, but in the meantime I’ll sketch my ideas. (It’s unfortunate that this comment appeared in a post that was so severely downvoted, as less people are likely to think about it now.)
There is no sense in which an absolute probability can be uncertain. Given our priors, and the data we have, Bayes’ rule can only give one answer.
However, there is a sense in which conditional probability can be uncertain. Since all probabilities in reality are conditional (at the very least, we have to condition on our thought process making any sense at all), it will be quite common in practice to feel uncertain about a probability, and to be well-justified in doing so.
Let me illustrate with the coin example. When I say that the next flip has a 50% chance of coming up heads, what I really mean is that the coin will come up heads in half of all universes that I can imagine (weighted by likelihood of occurrence) that are consistent with my observations so far.
However, we also have an estimate of another quantity, namely ‘the probability that the coin comes up heads’ (generically). I’m going to call this the weight of the coin since that is the colloquial term. When we say that we are 50% confident that the coin comes up heads (and that we have a high degree of confidence in our estimate), we really mean that we believe that the distribution over the weight of the coin is tightly concentrated about one-half. This will be the case after 10,000 flips, but not after 5 flips. (In fact after N heads and N tails, a weight of x has probability proportional to [x(1-x)] ^N.)
What is important to realize is that the statement ‘the coin will come up heads with probability 50%’ means ‘I believe that in half of all conceivable universes the coin will come up heads’, whereas ‘I am 90% confident that the coin will come up heads with probability 50%’ means something more along the lines of ‘I believe that in 90% of all conceivable universes my models predict a 50% chance of heads’. But there is also the difference that in the second statement, the ’90% of all conceivable universes’ only actually specifies them up to the extent that our models need in order to take over.
I think that this is similar to what humans do when they express confidence in a probability. However, there is an important difference, as in the previous case my ‘confidence in a probability’ corresponded to some hidden parameter that dictated the results of the coin under repeated trials. The hidden parameter in most real-world situations is far less clear, and we also don’t usually get to see repeated trials (I don’t think this should matter, but unfortunately my intuition is frequentist).
This sense of how certain or uncertain a probability is may have no place in a perfect Bayesian reasoner but I think it is meaningful information to consider as a human making decisions under uncertainty.
I don’t think the key issue is the imperfect Bayesianism of humans. I suppose that the discussed certainty of a probability has a lot to do with its dependence on priors—the more sensitive the probability is to change in priors we find arbitrary, the less certain it feels. Priors themselves feel most uncertain, while probabilities obtained from evidence-based calculations, especially those quasi-frequentist probabilities, as P(heads in next flip), depend on many priors and change in any single prior doesn’t move them too far. Perfect Bayesians may not have the feeling, but still have priors.
Sensitivity to priors is the same as sensitivity to new evidence. And when we’re sensitive to new evidence, our estimates are likely to change, which is another reason they’re uncertain.
The reason this phenomena occurs is because we are uncertain about some fundamental frequency, or a model more complex than a simple frequency model, and probability(heads|frequency of heads is x)=x.
I think there’s something to what you say but a perfect bayesian (or an imperfect human for that matter) is conditional probabilities all the way down. When we talk about our priors regarding a particular question they are really just the output of another chain of reasoning. The boundaries we draw to make discussion feasible are somewhat arbitrary (though they would probably reflect specific mathematical properties of the underlying network for a perfect Bayesian reasoner).
Do you think the chain of reasoning is infinite? For actual humans there is certainly some boundary under which the prior no more feels as an output of further computation, although such beliefs could have been influenced by earlier observations either subconsciously, or consciously while this fact has been forgotten later. Especially in the former case, I think the reasoning leading to such beliefs is very likely to be flawed, so it seems fair to treat such beliefs as genuine priors, even if, strictly speaking, they were physically influenced by evidence.
A perfect Bayesian, on the other hand, should be immune to flawed reasoning, but still it has to be finite, so I suppose it must have some genuine priors which are part of its immutable hardware. I imagine it in an analogy with formal systems, which have a finite set of axioms (or an infinite set defined by a finite set of conditions) and a finite set of derivation rules, and a set of theorems consisting of axioms and derived statements. For a Bayesian, axioms are replaced by several statements with associated priors, there is the Bayes’ theorem among the derivation rules, and instead of a set of theorems, it has a set of encountered statements with attached probability. Possible issues are:
If such formal construction is possible, there should be a lot of literature about it, and I am unaware of any (but I didn’t try to find too hard), and
I am not sure whether such an approach isn’t obsolete in the light of discussions about updateless decision theories and similar stuff.
Not infinite but for humans all priors (or their non-strict-Bayesian equivalent at least) ultimately derive either from sensory input over the individual’s lifetime or from millions of years of evolution baking in some ‘hard-coded’ priors to the human brain.
When dealing with any particular question you essentially draw a somewhat arbitrary line and lump millions of years of accumulated sensory input and evolutionary ‘learning’ together with a lifetime of actual learning and assign a single real number to it and call it a ‘prior’ but this is just a way of making calculation tractable.
Although I probably agree with your point, the chosen formulation is weird. The uncertainty is hidden in the probability, “uncertain probabilities” is sort of pleonasm. I like this comment, especially
I did spend some time thinking about exactly what this means after writing it. It seems to me there is a meaningful sense in which probabilities can be more or less uncertain and I haven’t seen it well dealt with by discussions of probability here. If I have a coin which I have run various tests on and convinced myself it is fair then I am fairly certain the probability of it coming up heads is 0.5. I think the probability of the Republicans gaining control of Congress in November is 0.7 but I am less certain about this probability. I think this uncertainty reflects some meaningful property of my state of knowledge.
I tentatively think that this sense of ‘certainty’ reflects something about the level of confidence I have in the models of the world from which these probabilities derive. It also possibly reflects something about my sense of what fraction of all the non-negligibly relevant information that exists I have actually used to reach my estimate. Another possible interpretation of this sense of certainty is a probability estimate for how likely I am to encounter information in the future which would significantly change my current probability estimate. A probability I am certain about is one I expect to be robust to the kinds of sensory input I think I might encounter in the future.
This sense of how certain or uncertain a probability is may have no place in a perfect Bayesian reasoner but I think it is meaningful information to consider as a human making decisions under uncertainty. In the context of the original comment, low probabilities are associated with rare events and as such are the kinds of thing we might expect to have a very incomplete model of or a very sparse sampling of relevant data for. They are probabilities which we might expect to easily double or halve in response to the acquisition of a relatively small amount of new sensory data.
Perhaps it’s as simple as how much you update when someone offers to make a bet with you. If you suspect your model is incomplete or you lack much of the relevant data then someone offering to make a bet with you will make you suspect they know something you don’t and so update your estimate significantly.
Here’s another example. Suppose you’re drawing balls from a large bin. You know the bin has red and white balls, but you don’t know how many there are of each.
After drawing two balls, you have one white and one red ball.
After drawing 100,000 balls, you have 50,000 white and 50,000 red balls.
In both cases you might assign a probability of .5 for drawing a white ball next, but it seems like in the n = 100,000 case you should be more certain of this probability than in the n = 2 case.
One could try to account for this by adding an extra criteria that specifies whether or not you expect your probability estimate to change. E.g. in the n = 10,000 case you’re .5 certain of drawing a white ball next, and .99 certain of this estimate not changing regardless of how many more trials you conduct. In the n = 2 case you would still be .5 certain of drawing a white ball next, but only .05 (or whatever your prior) certain of this being the probability you’ll eventually end up converging on.
This is the approach taken in Probabilistic Logic Networks, which uses ‘indefinite probabilities’ of the form <[L,U], b, k>. This stands roughly for “I assign a probability of b to the hypothesis that, after having observed k more pieces of evidence, the truth value I assign to S will lie in the interval [L, U]”.
Yes. I think this sense of how ‘certain’ I am about a probability probably corresponds to some larger scale property of a Bayesian network (some measure of how robust a particular probability is to new input data) but for humans using math to help with reasoning it might well be useful to have a more direct way of working with this concept.
This is also a problem I have thought about a bit. I plan to think about it more, organize my thoughts, and hopefully make a post about it soon, but in the meantime I’ll sketch my ideas. (It’s unfortunate that this comment appeared in a post that was so severely downvoted, as less people are likely to think about it now.)
There is no sense in which an absolute probability can be uncertain. Given our priors, and the data we have, Bayes’ rule can only give one answer.
However, there is a sense in which conditional probability can be uncertain. Since all probabilities in reality are conditional (at the very least, we have to condition on our thought process making any sense at all), it will be quite common in practice to feel uncertain about a probability, and to be well-justified in doing so.
Let me illustrate with the coin example. When I say that the next flip has a 50% chance of coming up heads, what I really mean is that the coin will come up heads in half of all universes that I can imagine (weighted by likelihood of occurrence) that are consistent with my observations so far.
However, we also have an estimate of another quantity, namely ‘the probability that the coin comes up heads’ (generically). I’m going to call this the weight of the coin since that is the colloquial term. When we say that we are 50% confident that the coin comes up heads (and that we have a high degree of confidence in our estimate), we really mean that we believe that the distribution over the weight of the coin is tightly concentrated about one-half. This will be the case after 10,000 flips, but not after 5 flips. (In fact after N heads and N tails, a weight of x has probability proportional to [x(1-x)] ^N.)
What is important to realize is that the statement ‘the coin will come up heads with probability 50%’ means ‘I believe that in half of all conceivable universes the coin will come up heads’, whereas ‘I am 90% confident that the coin will come up heads with probability 50%’ means something more along the lines of ‘I believe that in 90% of all conceivable universes my models predict a 50% chance of heads’. But there is also the difference that in the second statement, the ’90% of all conceivable universes’ only actually specifies them up to the extent that our models need in order to take over.
I think that this is similar to what humans do when they express confidence in a probability. However, there is an important difference, as in the previous case my ‘confidence in a probability’ corresponded to some hidden parameter that dictated the results of the coin under repeated trials. The hidden parameter in most real-world situations is far less clear, and we also don’t usually get to see repeated trials (I don’t think this should matter, but unfortunately my intuition is frequentist).
This is also a problem I have thought about a bit. I plan to think about it more, organize my thoughts, and hopefully make a post about it soon, but in the meantime I’ll sketch my ideas. (It’s unfortunate that this comment appeared in a post that was so severely downvoted, as less people are likely to think about it now.)
There is no sense in which an absolute probability can be uncertain. Given our priors, and the data we have, Bayes’ rule can only give one answer.
However, there is a sense in which conditional probability can be uncertain. Since all probabilities in reality are conditional (at the very least, we have to condition on our thought process making any sense at all), it will be quite common in practice to feel uncertain about a probability, and to be well-justified in doing so.
Let me illustrate with the coin example. When I say that the next flip has a 50% chance of coming up heads, what I really mean is that the coin will come up heads in half of all universes that I can imagine (weighted by likelihood of occurrence) that are consistent with my observations so far.
However, we also have an estimate of another quantity, namely ‘the probability that the coin comes up heads’ (generically). I’m going to call this the weight of the coin since that is the colloquial term. When we say that we are 50% confident that the coin comes up heads (and that we have a high degree of confidence in our estimate), we really mean that we believe that the distribution over the weight of the coin is tightly concentrated about one-half. This will be the case after 10,000 flips, but not after 5 flips. (In fact after N heads and N tails, a weight of x has probability proportional to [x(1-x)] ^N.)
What is important to realize is that the statement ‘the coin will come up heads with probability 50%’ means ‘I believe that in half of all conceivable universes the coin will come up heads’, whereas ‘I am 90% confident that the coin will come up heads with probability 50%’ means something more along the lines of ‘I believe that in 90% of all conceivable universes my models predict a 50% chance of heads’. But there is also the difference that in the second statement, the ’90% of all conceivable universes’ only actually specifies them up to the extent that our models need in order to take over.
I think that this is similar to what humans do when they express confidence in a probability. However, there is an important difference, as in the previous case my ‘confidence in a probability’ corresponded to some hidden parameter that dictated the results of the coin under repeated trials. The hidden parameter in most real-world situations is far less clear, and we also don’t usually get to see repeated trials (I don’t think this should matter, but unfortunately my intuition is frequentist).
I don’t think the key issue is the imperfect Bayesianism of humans. I suppose that the discussed certainty of a probability has a lot to do with its dependence on priors—the more sensitive the probability is to change in priors we find arbitrary, the less certain it feels. Priors themselves feel most uncertain, while probabilities obtained from evidence-based calculations, especially those quasi-frequentist probabilities, as P(heads in next flip), depend on many priors and change in any single prior doesn’t move them too far. Perfect Bayesians may not have the feeling, but still have priors.
Sensitivity to priors is the same as sensitivity to new evidence. And when we’re sensitive to new evidence, our estimates are likely to change, which is another reason they’re uncertain.
The reason this phenomena occurs is because we are uncertain about some fundamental frequency, or a model more complex than a simple frequency model, and probability(heads|frequency of heads is x)=x.
I think there’s something to what you say but a perfect bayesian (or an imperfect human for that matter) is conditional probabilities all the way down. When we talk about our priors regarding a particular question they are really just the output of another chain of reasoning. The boundaries we draw to make discussion feasible are somewhat arbitrary (though they would probably reflect specific mathematical properties of the underlying network for a perfect Bayesian reasoner).
Do you think the chain of reasoning is infinite? For actual humans there is certainly some boundary under which the prior no more feels as an output of further computation, although such beliefs could have been influenced by earlier observations either subconsciously, or consciously while this fact has been forgotten later. Especially in the former case, I think the reasoning leading to such beliefs is very likely to be flawed, so it seems fair to treat such beliefs as genuine priors, even if, strictly speaking, they were physically influenced by evidence.
A perfect Bayesian, on the other hand, should be immune to flawed reasoning, but still it has to be finite, so I suppose it must have some genuine priors which are part of its immutable hardware. I imagine it in an analogy with formal systems, which have a finite set of axioms (or an infinite set defined by a finite set of conditions) and a finite set of derivation rules, and a set of theorems consisting of axioms and derived statements. For a Bayesian, axioms are replaced by several statements with associated priors, there is the Bayes’ theorem among the derivation rules, and instead of a set of theorems, it has a set of encountered statements with attached probability. Possible issues are:
If such formal construction is possible, there should be a lot of literature about it, and I am unaware of any (but I didn’t try to find too hard), and
I am not sure whether such an approach isn’t obsolete in the light of discussions about updateless decision theories and similar stuff.
Not infinite but for humans all priors (or their non-strict-Bayesian equivalent at least) ultimately derive either from sensory input over the individual’s lifetime or from millions of years of evolution baking in some ‘hard-coded’ priors to the human brain.
When dealing with any particular question you essentially draw a somewhat arbitrary line and lump millions of years of accumulated sensory input and evolutionary ‘learning’ together with a lifetime of actual learning and assign a single real number to it and call it a ‘prior’ but this is just a way of making calculation tractable.