That would be missing the point. The vNM theorem says that if you have preferences over “lotteries” (probability distributions over outcomes; like, 20% chance of winning $5 and 80% chance of winning $10) that satisfy the axioms, then your decisionmaking can be represented as maximizing expected utility for some utility function over outcomes. The concept of “risk aversion” is about how you react to uncertainty (how you decide between lotteries) and is embodied in the utility function; it doesn’t apply to outcomes known with certainty. (How risk-averse are you about winning $5?)
In my hypothetical the two 50% probabilites are different. I want to express the difference between them. There are no sequences involved.
Obviously you’re allowed to have different beliefs about Coin 1 and Coin 2, which could be expressed in many ways. But your different beliefs about the coins don’t need to show up in your probability for a single coinflip. The reason for mentioning sequences of flips, is because that’s when your beliefs about Coin 1 vs. Coin 2 would start making different predictions.
Would it? My interest is in constructing a framework which provides useful, insightful, and reasonably accurate models for actual human decision-making. The vNM theorem is quite useless in this respect—I don’t know what my (or other people’s) utility function is, I cannot calculate or even estimate it, a great deal of important choices can be expressed as a set of lotteries only in very awkward ways, etc. And this is even besides the fact that empirical human preferences tend to not be coherent and they change with time.
Risk aversion is an easily observable fact. Every day in financial markets people pay very large amounts of money in order to reduce their risk (for the same expected return). If you think they are all wrong, by all means, go and become rich off these misguided fools.
But your different beliefs about the coins don’t need to show up in your probability for a single coinflip.
Why not? As I said, I want a richer way to talk about probabilities, more complex than taking them as simple scalars. Do you think it’s a bad idea? Does St.Bayes frown upon it?
As I said, I want a richer way to talk about probabilities, more complex than taking them as simple scalars. Do you think it’s a bad idea?
That’s right, I think it’s a bad idea: it sounds like what you actually want is a richer way to talk about your beliefs about Coin 2, but you can do that using standard probability theory, without needing to invent a new field of math from scratch.
Suppose you think Coin 2 is biased and lands heads some unknown fraction _r_ of the time. Your uncertainty about the parameter _r_ will be represented by a probability distribution: say it’s normally distributed with a mean of 0.5 and a standard deviation of 0.1. The point is, the probability of _r_ having a particular value is a different question from the the probability of getting heads on your first toss of Coin 2, which is still 0.5. You’d have to ask a different question than “What is the probability of heads on the first flip?” if you want the answer to distinguish the two coins. For example, the probability of getting exactly _k_ heads in _n_ flips is C(_n_, _k_)(0.5)^_k_(0.5)^(_n_−_k_) for Coin 1, but (I think?) ∫₀¹ (1/√(0.02π))_e_^−((_p_−0.5)^2/0.02) C(_n_, _k_)(_p_)^_k_(_p_)^(_n_−_k_) _dp_ for Coin 2.
Suppose you think Coin 2 is biased and lands heads some unknown fraction r of the time. Your uncertainty about the parameter r will be represented by a probability distribution: say it’s normally distributed with a mean of 0.5 and a standard deviation of 0.1. The point is, the probability of r having a particular value is a different question from the the probability of getting heads on your first toss of Coin 2, which is still 0.5.
A standard approach is to use the beta distribution to represent your uncertainty over the value of r.
but you can do that using standard probability theory
Of course I can. I can represent my beliefs about the probability as a distribution, a meta- (or a hyper-) distribution. But I’m being told that this is “meta-uncertainty” which right-thinking Bayesians are not supposed to have.
No one is talking about inventing new fields of math
say it’s normally distributed
Clearly not since the normal distribution goes from negative infinity to positive infinity and the probability goes merely from 0 to 1.
the probability of r having a particular value is a different question from the the probability of getting heads on your first toss of Coin 2, which is still 0.5
That 0.5 is conditional on the distribution of r, isn’t it? That makes it not a different question at all.
Notably, if I’m risk-averse, the risk of betting on Coin 1 looks different to me from the risk of betting on Coin2.
(Attempted humorous allusion to how Cox’s theorem derives probability theory from simple axioms about how reasoning under uncertainty should work, less relevant if no one is talking about inventing new fields of math.)
It seems like you’ve come to an agreement, so let me ruin things by adding my own interpretation.
The coin has some propensity to come up heads. Say it will in the long run come up heads r of the time. The number r is like a probability in that it satisfies the mathematical rules of probability (in particular the rate at which the coin comes up heads plus the rate at which it comes up tails must sum to one). But it’s a physical property of the coin; not anything to do with our opinion of it. The number r is just some particular number based on the shape of the coin (and the way it’s being tossed), it doesn’t change with our knowledge of the coin. So r isn’t a “probability” in the Bayesian sense—a description of our knowledge—it’s just something out there in the world.
Now if we have some Bayesian agent who doesn’t know r, then the must have some probability distribution over it. It could also be uncertain about the weight, w, and have a probability distribution over w. The distribuiton over r isn’t “meta-uncertainty” because it’s a distribution over a real physical thing in the world, not over our own internal probability assignments. The probability distribution over r is conceptually the same as the one over w.
Now suppose someone is about to flip the coin again. If we knew for certain what the value of r was we would then assign that same value as the probability of the coin coming up heads. If we don’t know for certain what r is then we must therefore average over all values of r according to our distribution. The probability of the coin landing heads is its expected value, E(r).
Now E(r) actually is a Bayesian probability—it is our degree of belief that the coin will come up heads. This transformation from r being a physical property to E(r) being a probability is produced by the particular question that we are asking. If we had instead asked about the probability of the coin denting the floor then this would depend on the weight and would be expressed as E(f(w)) for some function f representing how probable it was that the floor got dented at each weight. We don’t need a similar f in the case of r because we were free to choose the units of r so that this was unnecessary. If we had instead let r be the average number of heads in 1000 flips then we would have to have calculated the probability as E(f(r)) using f(r)=r/1000.
But the distribution over r does give you the extra information you wanted to describe. Coin 1 would have an r distribution tightly clustered around 1⁄2, whereas our distribution for Coin 2 would be more spread out. But we would have E(r) = 1⁄2 in both cases. Then, when we see more flips of the coins, our distributions change (although our distribution for Coin 1 probably doesn’t change very much; we are already quite certain) and we might no longer have that E(r_1) = E(r_2).
But it’s a physical property of the coin; not anything to do with our opinion of it.
Well, coin + environment, but sure, you’re making the point that r is not a random variable in the underlying reality. That’s fine, if we climb the turtles all the way down we’d find a a philosophical debate about whether the universe is deterministic and that’s not quite what we are interested in right now.
The distribuiton over r isn’t “meta-uncertainty” because it’s a distribution over a real physical thing in the world
I don’t think describing r as a “real physical thing” is useful in this context.
For example, we treat the outcome of each coin flip as stochastic, but you can easily make an argument that it is not, being a “real physical thing” instead, driven by deterministic physics.
For another example, it’s easy to add more meta-levels. Consider Alice forming a probability distribution of what Bob believes the probability distribution of r is...
This transformation from r being a physical property to E(r) being a probability is produced by the particular question that we are asking.
Isn’t r itself “produced by the particular question that we are asking”?
But the distribution over r does give you the extra information you wanted to describe.
I’m mostly interested in prescriptive rationality, and vNM is the right starting point for that (with game theory being the right next step, and more beyond, leading to MIRI’s research among other things). If you want a good descriptive alternative to vNM, check out prospect theory.
That would be missing the point. The vNM theorem says that if you have preferences over “lotteries” (probability distributions over outcomes; like, 20% chance of winning $5 and 80% chance of winning $10) that satisfy the axioms, then your decisionmaking can be represented as maximizing expected utility for some utility function over outcomes. The concept of “risk aversion” is about how you react to uncertainty (how you decide between lotteries) and is embodied in the utility function; it doesn’t apply to outcomes known with certainty. (How risk-averse are you about winning $5?)
See “The Allais Paradox” for how this was covered in the vaunted Sequences.
Obviously you’re allowed to have different beliefs about Coin 1 and Coin 2, which could be expressed in many ways. But your different beliefs about the coins don’t need to show up in your probability for a single coinflip. The reason for mentioning sequences of flips, is because that’s when your beliefs about Coin 1 vs. Coin 2 would start making different predictions.
Would it? My interest is in constructing a framework which provides useful, insightful, and reasonably accurate models for actual human decision-making. The vNM theorem is quite useless in this respect—I don’t know what my (or other people’s) utility function is, I cannot calculate or even estimate it, a great deal of important choices can be expressed as a set of lotteries only in very awkward ways, etc. And this is even besides the fact that empirical human preferences tend to not be coherent and they change with time.
Risk aversion is an easily observable fact. Every day in financial markets people pay very large amounts of money in order to reduce their risk (for the same expected return). If you think they are all wrong, by all means, go and become rich off these misguided fools.
Why not? As I said, I want a richer way to talk about probabilities, more complex than taking them as simple scalars. Do you think it’s a bad idea? Does St.Bayes frown upon it?
That’s right, I think it’s a bad idea: it sounds like what you actually want is a richer way to talk about your beliefs about Coin 2, but you can do that using standard probability theory, without needing to invent a new field of math from scratch.
Suppose you think Coin 2 is biased and lands heads some unknown fraction _r_ of the time. Your uncertainty about the parameter _r_ will be represented by a probability distribution: say it’s normally distributed with a mean of 0.5 and a standard deviation of 0.1. The point is, the probability of _r_ having a particular value is a different question from the the probability of getting heads on your first toss of Coin 2, which is still 0.5. You’d have to ask a different question than “What is the probability of heads on the first flip?” if you want the answer to distinguish the two coins. For example, the probability of getting exactly _k_ heads in _n_ flips is C(_n_, _k_)(0.5)^_k_(0.5)^(_n_−_k_) for Coin 1, but (I think?) ∫₀¹ (1/√(0.02π))_e_^−((_p_−0.5)^2/0.02) C(_n_, _k_)(_p_)^_k_(_p_)^(_n_−_k_) _dp_ for Coin 2.
St. Cox probably does.
A standard approach is to use the beta distribution to represent your uncertainty over the value of r.
Of course I can. I can represent my beliefs about the probability as a distribution, a meta- (or a hyper-) distribution. But I’m being told that this is “meta-uncertainty” which right-thinking Bayesians are not supposed to have.
No one is talking about inventing new fields of math
Clearly not since the normal distribution goes from negative infinity to positive infinity and the probability goes merely from 0 to 1.
That 0.5 is conditional on the distribution of r, isn’t it? That makes it not a different question at all.
Notably, if I’m risk-averse, the risk of betting on Coin 1 looks different to me from the risk of betting on Coin2.
Can you elaborate? It’s not clear to me.
Hm. Maybe those people are wrong??
That’s right; I should have either said “approximately”, or chosen a different distribution.
Yes, it is averaging over your distribution for _r_. Does it help if you think of probability as relative to subjective states of knowledge?
(Attempted humorous allusion to how Cox’s theorem derives probability theory from simple axioms about how reasoning under uncertainty should work, less relevant if no one is talking about inventing new fields of math.)
Nope.
That’s what I thought, too, and that disagreement led to this subthread.
But if we both say that we can easily talk about distributions of probabilities, we’re probably in agreement :-)
It seems like you’ve come to an agreement, so let me ruin things by adding my own interpretation.
The coin has some propensity to come up heads. Say it will in the long run come up heads r of the time. The number r is like a probability in that it satisfies the mathematical rules of probability (in particular the rate at which the coin comes up heads plus the rate at which it comes up tails must sum to one). But it’s a physical property of the coin; not anything to do with our opinion of it. The number r is just some particular number based on the shape of the coin (and the way it’s being tossed), it doesn’t change with our knowledge of the coin. So r isn’t a “probability” in the Bayesian sense—a description of our knowledge—it’s just something out there in the world.
Now if we have some Bayesian agent who doesn’t know r, then the must have some probability distribution over it. It could also be uncertain about the weight, w, and have a probability distribution over w. The distribuiton over r isn’t “meta-uncertainty” because it’s a distribution over a real physical thing in the world, not over our own internal probability assignments. The probability distribution over r is conceptually the same as the one over w.
Now suppose someone is about to flip the coin again. If we knew for certain what the value of r was we would then assign that same value as the probability of the coin coming up heads. If we don’t know for certain what r is then we must therefore average over all values of r according to our distribution. The probability of the coin landing heads is its expected value, E(r).
Now E(r) actually is a Bayesian probability—it is our degree of belief that the coin will come up heads. This transformation from r being a physical property to E(r) being a probability is produced by the particular question that we are asking. If we had instead asked about the probability of the coin denting the floor then this would depend on the weight and would be expressed as E(f(w)) for some function f representing how probable it was that the floor got dented at each weight. We don’t need a similar f in the case of r because we were free to choose the units of r so that this was unnecessary. If we had instead let r be the average number of heads in 1000 flips then we would have to have calculated the probability as E(f(r)) using f(r)=r/1000.
But the distribution over r does give you the extra information you wanted to describe. Coin 1 would have an r distribution tightly clustered around 1⁄2, whereas our distribution for Coin 2 would be more spread out. But we would have E(r) = 1⁄2 in both cases. Then, when we see more flips of the coins, our distributions change (although our distribution for Coin 1 probably doesn’t change very much; we are already quite certain) and we might no longer have that E(r_1) = E(r_2).
Well, coin + environment, but sure, you’re making the point that r is not a random variable in the underlying reality. That’s fine, if we climb the turtles all the way down we’d find a a philosophical debate about whether the universe is deterministic and that’s not quite what we are interested in right now.
I don’t think describing r as a “real physical thing” is useful in this context.
For example, we treat the outcome of each coin flip as stochastic, but you can easily make an argument that it is not, being a “real physical thing” instead, driven by deterministic physics.
For another example, it’s easy to add more meta-levels. Consider Alice forming a probability distribution of what Bob believes the probability distribution of r is...
Isn’t r itself “produced by the particular question that we are asking”?
Yes.
I’m mostly interested in prescriptive rationality, and vNM is the right starting point for that (with game theory being the right next step, and more beyond, leading to MIRI’s research among other things). If you want a good descriptive alternative to vNM, check out prospect theory.