Imagine that you had to give a probability density to each probability estimate you could make of Obama winning in 2012 being the correct one. You’d end up with something looking like a bell curve over probabilities, centered somewhere around “Obama has a 70% (or something) chance of winning.” Then to make a decision based on that distribution using normal decision theory, you would average over the possible results of an action, weighted by the probability. But this is equivalent to taking the mean of your bell curve—no matter how wide or narrow the bell curve, all that matters to your (standard decision theory) decision is the location of the mean.
Less evidence is like a wider bell curve, more evidence like a sharper one. But as long as the mean stays the same, the average result of each decision stays the same, so your decision will also be the same.
So there are two kinds of precision here: the precision of the mean probability given your current (incomplete) information, which can be arbitrarily high, and the precision with which you estimate the true answer, which is the width of the bell curve. So when you say “precision,” there is a possible confusion. Your first post was about the “how precise can these probabilities be,” which was the first (and boring, since it’s so high) kind of precision, while this post seems to be talking about the second kind, the kind that is more useful because it reflects how much evidence you have.
So there are two kinds of precision here: the precision of the mean probability given your current (incomplete) information, which can be arbitrarily high, and the precision with which you estimate the true answer, which is the width of the bell curve.
I’m not sure what you mean by the “true answer”. After all, in some sense the true probability is either 0 or 1 it’s just that we don’t know which.
That’s a good point. So I guess the second kind of precision doesn’t make sense in this case (like it would if the bell curve were over, say, the number of beans in a jar), and “precision” should only refer to “precision with which we can extract an average probability from our information,” which is very high.
Imagine that you had to give a probability density to each probability estimate you could make of Obama winning in 2012 being the correct one. You’d end up with something looking like a bell curve over probabilities
Bell curves prefer to live on unbounded intervals! It would be less jarring, (and less convenient for you?), if he ended up with something looking like a uniform distribution over probabilities.
It’s equally convenient, since the mean doesn’t care about the shape. I don’t think it’s particularly jarring—just imagine it going to 0 at the edges.
The reason you’ll probably end up with something like a bell curve is a practical one—the central limit theorem. For complicated problems, you very often get what looks something like a bell curve. Hardly watertight, but I’d bet decent amounts of money that it is true in this case, so why not use it to add a little color to the description?
Not that I can think of, besides memory/speed constaints, and how much updating you can have done with the evidence you’ve recieved.
Why can’t it happen that you have so little and/or such weak evidence, that the amount of precision you should have is none at all?
Imagine that you had to give a probability density to each probability estimate you could make of Obama winning in 2012 being the correct one. You’d end up with something looking like a bell curve over probabilities, centered somewhere around “Obama has a 70% (or something) chance of winning.” Then to make a decision based on that distribution using normal decision theory, you would average over the possible results of an action, weighted by the probability. But this is equivalent to taking the mean of your bell curve—no matter how wide or narrow the bell curve, all that matters to your (standard decision theory) decision is the location of the mean.
Less evidence is like a wider bell curve, more evidence like a sharper one. But as long as the mean stays the same, the average result of each decision stays the same, so your decision will also be the same.
So there are two kinds of precision here: the precision of the mean probability given your current (incomplete) information, which can be arbitrarily high, and the precision with which you estimate the true answer, which is the width of the bell curve. So when you say “precision,” there is a possible confusion. Your first post was about the “how precise can these probabilities be,” which was the first (and boring, since it’s so high) kind of precision, while this post seems to be talking about the second kind, the kind that is more useful because it reflects how much evidence you have.
I’m not sure what you mean by the “true answer”. After all, in some sense the true probability is either 0 or 1 it’s just that we don’t know which.
That’s a good point. So I guess the second kind of precision doesn’t make sense in this case (like it would if the bell curve were over, say, the number of beans in a jar), and “precision” should only refer to “precision with which we can extract an average probability from our information,” which is very high.
Bell curves prefer to live on unbounded intervals! It would be less jarring, (and less convenient for you?), if he ended up with something looking like a uniform distribution over probabilities.
It’s equally convenient, since the mean doesn’t care about the shape. I don’t think it’s particularly jarring—just imagine it going to 0 at the edges.
The reason you’ll probably end up with something like a bell curve is a practical one—the central limit theorem. For complicated problems, you very often get what looks something like a bell curve. Hardly watertight, but I’d bet decent amounts of money that it is true in this case, so why not use it to add a little color to the description?
Well, your prior gives you a unique value, and bayes theorem is a function, so it gives you a unique value for every input.
So the claim is that you have arbitrary precision priors. What are they, and where are they stored?
Sorry, I haven’t been very clear. A perfect bayesian agent would have a unique real number to represent it’s level of belief in every hypothesis.
The betting-offer system I described about can force people (and force any hypothetical agent) to assign unique values.
Of course, an actual person won’t be capable of this level of precision or coherence.
Yes, but actually computing that function is computationally intractable in all but the simplest examples.