The numerical value of the prior itself doesn’t tell how much information—or lack thereof—is incorporated into the prior.
What’s a simple way to state how certain you are about a prior, i.e. how stable it is against large updates based on new information? Error bars or something related don’t necessarily do the job—you might be very sure that the true Pr (EDIT: that was poorly phrased, probability is in the mind etc., what was meant is the eventual Pr you end up with once you’ve hypothetically parsed all possible information, the limit) is between 0.3 and 0.5, i.e. new information will rarely result in a posterior outside that range, even if the size of the range (wrongly) suggests that the prior is based on little information. Is there something more intuitive than Pr(0.3<Pr(A)<0.5) = high?
The idea of having a “true probability” can be extremely misleading. If I flip a coin but don’t look at it, I may call it a 50% probability of tails, but reality is sitting right there in my hand with probability 100%. The probability is not in the external world—the coin is already heads or tails. The probability is just 50% because I haven’t looked at the coin yet.
What sometimes confuses people is that there can be things in the world that we often think of as probabilities, and those can have a true value. For example, if I have an urn with 30 black balls and 70 white balls, and I pull a ball from the urn, I’ll get a black ball about 30 times out of 100. This isn’t “because the true probability is 30%”—that’s an explanation that just points to a new fundamental property to explain. It’s because the urn is 30% black balls, and I hadn’t looked at where all the balls were yet.
Using probabilities is an admission of ignorance, of incomplete information. You don’t assign the coin a probability because it’s magically probabilistic, you use probabilities because you haven’t looked at the coin yet. There’s no “true probability” sitting out there in the world waiting for you to discover it, there’s only a coin that’s either heads or tails. And sometimes there are urns with different mixtures of balls, though of course if you can look inside the urn it’s easy to pick the ball you want.
Part 2:
Okay, so there’s no “externally objective, realio trulio probability” to compare our priors to, so how about asking how much our probability will move after we get the next bit of information?
Let’s use some examples. Say I’m taking a poll. And I want to know what the probability is that people will vote for the Purple Party. So I ask 10 people. Now, 10 is a pretty small sample size, but say 3 out of 10 will vote for the purple party. So I estimate that the probability is a little more than 3⁄10. Now, the next additional person I ask will cause me to change my probability by about 10% of its current value. But after I poll 1000 people, asking the next person barely changes my probability estimate. Stability!
This actually works pretty well.
If you wanted to split up your hypothesis space about the poll results into mutually exclusive and exhaustive pieces (which is generally a good idea), you would have a million different hypotheses, because there are a million (well, 1,000,001) different possible numbers of Purple Party supporters. So for example there would be separate hypotheses for 300,000 Purple Party supporters vs. 300,001. Giving each of these hypotheses their own probability is sufficient to talk about the kind of stability you want. If the probabilities are concentrated on a few possible numbers, then your poll is really stable.
And a good thing that it works out, because the probabilities of those million hypotheses are all of the information you have about this poll!
Note that this happens without any mention of “true probability.” We chose those million hypotheses because there are realio trulio a million different possible answers. A narrow distribution over these hypotheses represents certainty not about some true probability, but about the number of actual people out in the actual world, wearing actual purple.
So thank goodness a probability distribution over the external possibilities is all ya’ need, because it’s all ya’ got in this case.
Thanks, the “true probability” phrasing was misleading, I should’ve reread my comment before submitting. Probability is in the mind etc., what I referred to was “the probability you’d eventually end up with, having incorporated all relevant information, the limit”, which is still in your mind, but as close to “true” as you’ll get.
So you can of course say Pr(Box is empty | I saw it’s empty) = x and Pr(Box is empty | I saw it’s empty and I got to examine its inner surfaces with my hand) = y, then list all similar hypothesis about the box being empty conditioned on various experiments, and compare x, y etc. to get a notion of the stability of your prior.
However, such a listing is quite tedious, and countably infinite as well, even if it’s the only full representation of your box-is-empty belief conditioned on all possible information.
The point was that “my prior about the box being empty is low / high / whatever” doesn’t give any information about whether you’ve just guesstimated it—or—whether you’re very sure about your value and will likely discount (for the most part) any new information showing the contrary as being a fluke, or a trick. A magician seemingly countering gravity with a levitation trick only marginally lowers your prior how gravity works.
Now when you actually talk to someone, you’ll often convey priors about many things, but less often how stable you deem those priors to be. This dice is probably loaded … the ‘probably’ refers to your prior, but it does not refer to how fast that prior could change. Maybe it’s a dice a friend who’s gathering loaded dice is presenting to you, so if you check it you’ll be quickly convinced if it’s not loaded. Maybe it’s your trusted loaded dice from childhood which you’ve used thousands of times, and if it doesn’t appear to be loaded on the next few throws, you’ll still consider it to be loaded.
Yet in both cases you’d say “the dice is probably loaded”. How do you usefully convey the extra information about the stability of your prior? “The dice is probably loaded, and my belief in that isn’t likely to change” so to speak? Not a theoretical definition of stability—only listing all your beliefs can represent those—but, as in the grandparent—a simple and intuitive way of conveying that important extra information about stability, and a plea to start conveying that information.
Thanks, the “true probability” phrasing was misleading, I should’ve reread my comment before submitting. Probability is in the mind etc., what I referred to was “the probability you’d eventually end up with, having incorporated all relevant information, the limit”, which is still in your mind, but as close to “true” as you’ll get.
Now when you actually talk to someone, you’ll often convey priors about many things, but less often how stable you deem those priors to be. This dice is probably loaded … the ‘probably’ refers to your prior, but it does not refer to how fast that prior could change. Maybe it’s a dice a friend who’s gathering loaded dice is presenting to you, so if you check it you’ll be quickly convinced if it’s not loaded. Maybe it’s your trusted loaded dice from childhood which you’ve used thousands of times, and if it doesn’t appear to be loaded on the next few throws, you’ll still consider it to be loaded.
I believe this is a model space problem. We’re looking at a toy bayesian reasoner that can be easily modeled in a human mind, predicting how it will update its hypotheses about dice in response to evidence like the same number coming up too often. Our toy bayesian, of course, assigns probability 0 to encountering evidence like “my trusted expert friend says it’s loaded,” so that wouldn’t change its probabilities at all. But that’s not a flaw in bayesian reasoning; it’s a flaw in the kind of bayesian reasoner that can be easily modeled in a human mind.
This doesn’t demonstrate that human reasoning that works doesn’t have a bayesian core. E.g., I don’t know how I would update my probabilities about a die being loaded if, say, my left arm turned into a purple tentacle and started singing “La Bamba.” But it does show that even an ideal reasoner can’t always out-predict a computationally limited one; if the computationally limited one has access to a much better prior, and/or a whole lot more evidence.
Error bars usually indicate a Gaussian distribution, not a flat one. If you said P=0.4 +- 0.03, that indicates that your probability of the final probability estimate ending up outside the 0.3-0.5 range is less than a percent. This seems to meet your requirements.
If that doesn’t suffice, it seems that you need a full probability distribution, specifying the probability of every P-value.
I prefer to quantify my lack of information and call it a prior. Then it’s even better than wrong information!
The numerical value of the prior itself doesn’t tell how much information—or lack thereof—is incorporated into the prior.
What’s a simple way to state how certain you are about a prior, i.e. how stable it is against large updates based on new information? Error bars or something related don’t necessarily do the job—you might be very sure that the true Pr (EDIT: that was poorly phrased, probability is in the mind etc., what was meant is the eventual Pr you end up with once you’ve hypothetically parsed all possible information, the limit) is between 0.3 and 0.5, i.e. new information will rarely result in a posterior outside that range, even if the size of the range (wrongly) suggests that the prior is based on little information. Is there something more intuitive than Pr(0.3<Pr(A)<0.5) = high?
Part 1:
The idea of having a “true probability” can be extremely misleading. If I flip a coin but don’t look at it, I may call it a 50% probability of tails, but reality is sitting right there in my hand with probability 100%. The probability is not in the external world—the coin is already heads or tails. The probability is just 50% because I haven’t looked at the coin yet.
What sometimes confuses people is that there can be things in the world that we often think of as probabilities, and those can have a true value. For example, if I have an urn with 30 black balls and 70 white balls, and I pull a ball from the urn, I’ll get a black ball about 30 times out of 100. This isn’t “because the true probability is 30%”—that’s an explanation that just points to a new fundamental property to explain. It’s because the urn is 30% black balls, and I hadn’t looked at where all the balls were yet.
Using probabilities is an admission of ignorance, of incomplete information. You don’t assign the coin a probability because it’s magically probabilistic, you use probabilities because you haven’t looked at the coin yet. There’s no “true probability” sitting out there in the world waiting for you to discover it, there’s only a coin that’s either heads or tails. And sometimes there are urns with different mixtures of balls, though of course if you can look inside the urn it’s easy to pick the ball you want.
Part 2:
Okay, so there’s no “externally objective, realio trulio probability” to compare our priors to, so how about asking how much our probability will move after we get the next bit of information?
Let’s use some examples. Say I’m taking a poll. And I want to know what the probability is that people will vote for the Purple Party. So I ask 10 people. Now, 10 is a pretty small sample size, but say 3 out of 10 will vote for the purple party. So I estimate that the probability is a little more than 3⁄10. Now, the next additional person I ask will cause me to change my probability by about 10% of its current value. But after I poll 1000 people, asking the next person barely changes my probability estimate. Stability!
This actually works pretty well.
If you wanted to split up your hypothesis space about the poll results into mutually exclusive and exhaustive pieces (which is generally a good idea), you would have a million different hypotheses, because there are a million (well, 1,000,001) different possible numbers of Purple Party supporters. So for example there would be separate hypotheses for 300,000 Purple Party supporters vs. 300,001. Giving each of these hypotheses their own probability is sufficient to talk about the kind of stability you want. If the probabilities are concentrated on a few possible numbers, then your poll is really stable.
And a good thing that it works out, because the probabilities of those million hypotheses are all of the information you have about this poll!
Note that this happens without any mention of “true probability.” We chose those million hypotheses because there are realio trulio a million different possible answers. A narrow distribution over these hypotheses represents certainty not about some true probability, but about the number of actual people out in the actual world, wearing actual purple.
So thank goodness a probability distribution over the external possibilities is all ya’ need, because it’s all ya’ got in this case.
Thanks, the “true probability” phrasing was misleading, I should’ve reread my comment before submitting. Probability is in the mind etc., what I referred to was “the probability you’d eventually end up with, having incorporated all relevant information, the limit”, which is still in your mind, but as close to “true” as you’ll get.
So you can of course say Pr(Box is empty | I saw it’s empty) = x and Pr(Box is empty | I saw it’s empty and I got to examine its inner surfaces with my hand) = y, then list all similar hypothesis about the box being empty conditioned on various experiments, and compare x, y etc. to get a notion of the stability of your prior.
However, such a listing is quite tedious, and countably infinite as well, even if it’s the only full representation of your box-is-empty belief conditioned on all possible information.
The point was that “my prior about the box being empty is low / high / whatever” doesn’t give any information about whether you’ve just guesstimated it—or—whether you’re very sure about your value and will likely discount (for the most part) any new information showing the contrary as being a fluke, or a trick. A magician seemingly countering gravity with a levitation trick only marginally lowers your prior how gravity works.
Now when you actually talk to someone, you’ll often convey priors about many things, but less often how stable you deem those priors to be. This dice is probably loaded … the ‘probably’ refers to your prior, but it does not refer to how fast that prior could change. Maybe it’s a dice a friend who’s gathering loaded dice is presenting to you, so if you check it you’ll be quickly convinced if it’s not loaded. Maybe it’s your trusted loaded dice from childhood which you’ve used thousands of times, and if it doesn’t appear to be loaded on the next few throws, you’ll still consider it to be loaded.
Yet in both cases you’d say “the dice is probably loaded”. How do you usefully convey the extra information about the stability of your prior? “The dice is probably loaded, and my belief in that isn’t likely to change” so to speak? Not a theoretical definition of stability—only listing all your beliefs can represent those—but, as in the grandparent—a simple and intuitive way of conveying that important extra information about stability, and a plea to start conveying that information.
Relevant resource: Probability is subjectively objective.
I believe this is a model space problem. We’re looking at a toy bayesian reasoner that can be easily modeled in a human mind, predicting how it will update its hypotheses about dice in response to evidence like the same number coming up too often. Our toy bayesian, of course, assigns probability 0 to encountering evidence like “my trusted expert friend says it’s loaded,” so that wouldn’t change its probabilities at all. But that’s not a flaw in bayesian reasoning; it’s a flaw in the kind of bayesian reasoner that can be easily modeled in a human mind.
This doesn’t demonstrate that human reasoning that works doesn’t have a bayesian core. E.g., I don’t know how I would update my probabilities about a die being loaded if, say, my left arm turned into a purple tentacle and started singing “La Bamba.” But it does show that even an ideal reasoner can’t always out-predict a computationally limited one; if the computationally limited one has access to a much better prior, and/or a whole lot more evidence.
Error bars usually indicate a Gaussian distribution, not a flat one. If you said P=0.4 +- 0.03, that indicates that your probability of the final probability estimate ending up outside the 0.3-0.5 range is less than a percent. This seems to meet your requirements.
If that doesn’t suffice, it seems that you need a full probability distribution, specifying the probability of every P-value.
Describing probabilities in terms of a mean and an approximate standard deviation, perhaps? Low standard deviation would translate to high certainty.