Imagine a graph where the x-axis represents our probability estimate for a given statement being true and the y-axis represents our certainty that our probability estimate is correct. So if, for example, we estimate a probability of .6 for a given statement to be true but we’re only mildly certain of that estimate, then our belief graph would probably look like a shallow bell curve
I don’t understand where the bell curve is coming from. If you have one probability estimate for a given statement with some certainty about it, you would depict it as a single point on your graph.
The bell curves in this context usually represent probability distributions. The width of that probability distribution reflects your uncertainty. If you’re certain, the distribution is narrow and looks like a spike at the estimate value. If you’re uncertain, the distribution is flat(ter). Probability distributions have to sum to 1 under the curve, so the smaller the width of the distribution, the higher the spike is.
How likely you are to discover new evidence is neither here nor there. Even if you are very uncertain of your estimate, this does not convert into the probability of finding new evidence.
I think you’re referring to the type of statement that can have many values. Something like “how long will it take for AGI to be developed?”. My impression (correct me if I’m wrong) is that this is what’s normally graphed with a probability distribution. Each possible value is assigned a probability, and the result is usually more or less a bell curve with the width of the curve representing your certainty.
I’m referring to a very basic T/F statement. On a normal probability distribution graph that would indeed be represented as a single point—the probability you’d assign to it being true. But we’re often not so confident in our assessment of the probability we’ve assigned, and that confidence is what I was trying to represent with the y-axis.
An example might be, “will AGI be developed within 30 years”? There’s no range of values here, so on a normal probability distribution graph you’d simply assign a probability and that’s it. But there’s a very big difference between saying “I really have not the slightest clue, but if I really must assign it a probability than I’d give it maybe 50%” vs. “I’ve researched the subject for years and I’m confident in my assessment that there’s a 50% probability”.
In my scheme, what I’m really discussing is the probability distribution of probability estimates for a given statement. So for the 30-year AGI question, what’s the probability that you’d consider a 10% probability estimate to be reasonable? What about a 90% estimate? The probability that you’d assign to each probability estimate is depicted as a single point on the graph and the result is usually more or less a bell curve.
How likely you are to discover new evidence is neither here nor there. Even if you are very uncertain of your estimate, this does not convert into the probability of finding new evidence.
You’re probably correct about this. But I’ve found the concept of the kind of graph I’ve been describing to be intuitively useful, and saying that it represents the probability of finding new evidence was just my attempt at understanding what such a graph would actually mean.
In my scheme, what I’m really discussing is the probability distribution of probability estimates for a given statement.
OK, let’s rephrase it in the terms of Bayesian hierarchical models. You have a model of event X happening in the future which says that the probability of that event is Y%. Y is a parameter of your model. What you are doing is giving a probability distribution for a parameter of your model (in the general case this distribution can be conditional, which makes it a meta-model, so hierarchical). That’s fine, you can do this. In this context the width of the distribution reflects how precise your estimate of the lower-level model parameter is.
The only thing is that for unique events (“will AGI be developed within 30 years”) your hierarchical model is not falsifiable. You will get a single realization (the event will either happen or it will not), but you will never get information on the “true” value of your model parameter Y. You will get a single update of your prior to a posterior and that’s it.
I think that is what I had in mind, but it sounds from the way you’re saying it that this hasn’t been discussed as a specific technique for visualizing belief probabilities.
That surprises me since I’ve found it to be very useful, at least for intuitively getting a handle on my confidence in my own beliefs. When dealing with the question of what probability to assign to belief X, I don’t just give it a single probability estimate, and I don’t even give it a probability estimate with the qualifier that my confidence in that probability is low/moderate/high. Rather I visualize a graph with (usually) a bell curve peaking at the probability estimate I’d assign and whose width represents my certainty in that estimate. To me that’s a lot more nuanced than just saying “50% with low confidence”. It has also helped me to communicate to others what my views are for a given belief. I’d also suspect that you can do a lot of interesting things by mathematically manipulating and combining such graphs.
One problem is that it’s turtles all the way down.
What’s your confidence in your confidence probability estimate? You can represent that as another probability distribution (or another model, or a set of models). Rinse and repeat.
Another problem is that it’s hard to get reasonable estimates for all the curves that you want to mathematically manipulate. Of course you can wave hands and say that a particular curve exactly represents your beliefs and no one can say it ain’t so, but fake precision isn’t exactly useful.
I’m referring to a very basic T/F statement. On a normal probability distribution graph that would indeed be represented as a single point—the probability you’d assign to it being true. But we’re often not so confident in our assessment of the probability we’ve assigned, and that confidence is what I was trying to represent with the y-axis.
Taken literally, the concept of “confidence in a probability” is incoherent. You are probably confusing it with one of several related concepts. Lumifer has described one example of such a concept.
Another concept is how much you think your probability estimate will change as you encounter new evidence. For example, your estimate for whether the outcome of the coin flip for the 2050 Superbowl will be heads is 1⁄2, and you are unlikely to encounter evidence that changes it (until 2050 that is). On the other hand, your estimate for the probability AI being developed by 2050 is likely to change a lot as you encounter more evidence.
It wouldn’t be the first time a sport has gone from vastly popular to mostly forgotten within 40 years. Jai alai was the particular example I had in mind; it was once incredibly popular, but quickly descended to the point where it’s basically entirely forgotten.
Taken literally, the concept of “confidence in a probability” is incoherent.
Why? I thought the way Lumifer expressed it in terms of Bayesian hierarchical models was pretty coherent. It might be turtles all the way down as he says, and it might be hard to use it in a rigorous mathematical way, but at least it’s coherent. (And useful, in my experience.)
Another concept is how much you think your probability estimate will change as you encounter new evidence.
This is pretty much what I meant in my original post by writing:
I usually think of the height of the curve at any given point as representing how likely I think it is that I’ll discover evidence that will change my belief. So for a low bell curve centered on .6, I think of that as meaning that I’d currently assign the belief a probability of around .6 but I also consider it likely that I’ll discover evidence (if I look for it) that can change my opinion significantly in any direction.
But expressing it in terms of how likely my beliefs are to change given more evidence is probably better. Or to say it in yet another way: how strong new evidence would need to be for me to change my estimate.
It seems like the scheme I’ve been proposing here is not a common one. So how do people usually express the obvious difference between a probability estimate of 50% for a coin flip (unlikely to change with more evidence) vs. a probability estimate of 50% for AI being developed by 2050 (very likely to change with more evidence)?
I don’t understand where the bell curve is coming from. If you have one probability estimate for a given statement with some certainty about it, you would depict it as a single point on your graph.
The bell curves in this context usually represent probability distributions. The width of that probability distribution reflects your uncertainty. If you’re certain, the distribution is narrow and looks like a spike at the estimate value. If you’re uncertain, the distribution is flat(ter). Probability distributions have to sum to 1 under the curve, so the smaller the width of the distribution, the higher the spike is.
How likely you are to discover new evidence is neither here nor there. Even if you are very uncertain of your estimate, this does not convert into the probability of finding new evidence.
I think you’re referring to the type of statement that can have many values. Something like “how long will it take for AGI to be developed?”. My impression (correct me if I’m wrong) is that this is what’s normally graphed with a probability distribution. Each possible value is assigned a probability, and the result is usually more or less a bell curve with the width of the curve representing your certainty.
I’m referring to a very basic T/F statement. On a normal probability distribution graph that would indeed be represented as a single point—the probability you’d assign to it being true. But we’re often not so confident in our assessment of the probability we’ve assigned, and that confidence is what I was trying to represent with the y-axis.
An example might be, “will AGI be developed within 30 years”? There’s no range of values here, so on a normal probability distribution graph you’d simply assign a probability and that’s it. But there’s a very big difference between saying “I really have not the slightest clue, but if I really must assign it a probability than I’d give it maybe 50%” vs. “I’ve researched the subject for years and I’m confident in my assessment that there’s a 50% probability”.
In my scheme, what I’m really discussing is the probability distribution of probability estimates for a given statement. So for the 30-year AGI question, what’s the probability that you’d consider a 10% probability estimate to be reasonable? What about a 90% estimate? The probability that you’d assign to each probability estimate is depicted as a single point on the graph and the result is usually more or less a bell curve.
You’re probably correct about this. But I’ve found the concept of the kind of graph I’ve been describing to be intuitively useful, and saying that it represents the probability of finding new evidence was just my attempt at understanding what such a graph would actually mean.
OK, let’s rephrase it in the terms of Bayesian hierarchical models. You have a model of event X happening in the future which says that the probability of that event is Y%. Y is a parameter of your model. What you are doing is giving a probability distribution for a parameter of your model (in the general case this distribution can be conditional, which makes it a meta-model, so hierarchical). That’s fine, you can do this. In this context the width of the distribution reflects how precise your estimate of the lower-level model parameter is.
The only thing is that for unique events (“will AGI be developed within 30 years”) your hierarchical model is not falsifiable. You will get a single realization (the event will either happen or it will not), but you will never get information on the “true” value of your model parameter Y. You will get a single update of your prior to a posterior and that’s it.
Is that what you have in mind?
I think that is what I had in mind, but it sounds from the way you’re saying it that this hasn’t been discussed as a specific technique for visualizing belief probabilities.
That surprises me since I’ve found it to be very useful, at least for intuitively getting a handle on my confidence in my own beliefs. When dealing with the question of what probability to assign to belief X, I don’t just give it a single probability estimate, and I don’t even give it a probability estimate with the qualifier that my confidence in that probability is low/moderate/high. Rather I visualize a graph with (usually) a bell curve peaking at the probability estimate I’d assign and whose width represents my certainty in that estimate. To me that’s a lot more nuanced than just saying “50% with low confidence”. It has also helped me to communicate to others what my views are for a given belief. I’d also suspect that you can do a lot of interesting things by mathematically manipulating and combining such graphs.
One problem is that it’s turtles all the way down.
What’s your confidence in your confidence probability estimate? You can represent that as another probability distribution (or another model, or a set of models). Rinse and repeat.
Another problem is that it’s hard to get reasonable estimates for all the curves that you want to mathematically manipulate. Of course you can wave hands and say that a particular curve exactly represents your beliefs and no one can say it ain’t so, but fake precision isn’t exactly useful.
Taken literally, the concept of “confidence in a probability” is incoherent. You are probably confusing it with one of several related concepts. Lumifer has described one example of such a concept.
Another concept is how much you think your probability estimate will change as you encounter new evidence. For example, your estimate for whether the outcome of the coin flip for the 2050 Superbowl will be heads is 1⁄2, and you are unlikely to encounter evidence that changes it (until 2050 that is). On the other hand, your estimate for the probability AI being developed by 2050 is likely to change a lot as you encounter more evidence.
I don’t know, I think the existence of the 2050 Superbowl is significantly less than 100% likely.
What’s your line of thought?
It wouldn’t be the first time a sport has gone from vastly popular to mostly forgotten within 40 years. Jai alai was the particular example I had in mind; it was once incredibly popular, but quickly descended to the point where it’s basically entirely forgotten.
Why? I thought the way Lumifer expressed it in terms of Bayesian hierarchical models was pretty coherent. It might be turtles all the way down as he says, and it might be hard to use it in a rigorous mathematical way, but at least it’s coherent. (And useful, in my experience.)
This is pretty much what I meant in my original post by writing:
But expressing it in terms of how likely my beliefs are to change given more evidence is probably better. Or to say it in yet another way: how strong new evidence would need to be for me to change my estimate.
It seems like the scheme I’ve been proposing here is not a common one. So how do people usually express the obvious difference between a probability estimate of 50% for a coin flip (unlikely to change with more evidence) vs. a probability estimate of 50% for AI being developed by 2050 (very likely to change with more evidence)?