Bayesian rationality is also about decision making, not just beliefs. Usually people take it to mean expected utility maximization. Just assume my post said that instead.
My betting behavior w.r.t. the next coinflip is indeed the same for the two coins. My probability distributions over longer sequences of coinflips are different between the two coins. For example, P(10th flip is heads | first 9 are heads) is 1⁄2 for the first coin and close to 1 for the second coin. You can describe it as uncertainty over a hidden parameter, but you can make the same decisions without it, using only probabilities over sequences. The kind of meta-uncertainty you seem to want, that gets you out of uncomfortable bets, doesn’t exist for Bayesians.
You are just rearranging the problem without solving it. Can my utility function include risk aversion? If it can, we’re back to the square one: a risk-averse Bayesian rational agent.
And that’s even besides the observation that being Bayesian and being committed to expected utility maximization are orthogonal things.
The kind of meta-uncertainty you seem to want, that gets you out of uncomfortable bets, doesn’t exist for Bayesians.
I have no need for something that can get me out of uncomfortable bets since I’m perfectly fine with not betting at all. What I want is a representation for probability that is more rich than a simple scalar.
In my hypothetical the two 50% probabilites are different. I want to express the difference between them. There are no sequences involved.
That would be missing the point. The vNM theorem says that if you have preferences over “lotteries” (probability distributions over outcomes; like, 20% chance of winning $5 and 80% chance of winning $10) that satisfy the axioms, then your decisionmaking can be represented as maximizing expected utility for some utility function over outcomes. The concept of “risk aversion” is about how you react to uncertainty (how you decide between lotteries) and is embodied in the utility function; it doesn’t apply to outcomes known with certainty. (How risk-averse are you about winning $5?)
In my hypothetical the two 50% probabilites are different. I want to express the difference between them. There are no sequences involved.
Obviously you’re allowed to have different beliefs about Coin 1 and Coin 2, which could be expressed in many ways. But your different beliefs about the coins don’t need to show up in your probability for a single coinflip. The reason for mentioning sequences of flips, is because that’s when your beliefs about Coin 1 vs. Coin 2 would start making different predictions.
Would it? My interest is in constructing a framework which provides useful, insightful, and reasonably accurate models for actual human decision-making. The vNM theorem is quite useless in this respect—I don’t know what my (or other people’s) utility function is, I cannot calculate or even estimate it, a great deal of important choices can be expressed as a set of lotteries only in very awkward ways, etc. And this is even besides the fact that empirical human preferences tend to not be coherent and they change with time.
Risk aversion is an easily observable fact. Every day in financial markets people pay very large amounts of money in order to reduce their risk (for the same expected return). If you think they are all wrong, by all means, go and become rich off these misguided fools.
But your different beliefs about the coins don’t need to show up in your probability for a single coinflip.
Why not? As I said, I want a richer way to talk about probabilities, more complex than taking them as simple scalars. Do you think it’s a bad idea? Does St.Bayes frown upon it?
As I said, I want a richer way to talk about probabilities, more complex than taking them as simple scalars. Do you think it’s a bad idea?
That’s right, I think it’s a bad idea: it sounds like what you actually want is a richer way to talk about your beliefs about Coin 2, but you can do that using standard probability theory, without needing to invent a new field of math from scratch.
Suppose you think Coin 2 is biased and lands heads some unknown fraction _r_ of the time. Your uncertainty about the parameter _r_ will be represented by a probability distribution: say it’s normally distributed with a mean of 0.5 and a standard deviation of 0.1. The point is, the probability of _r_ having a particular value is a different question from the the probability of getting heads on your first toss of Coin 2, which is still 0.5. You’d have to ask a different question than “What is the probability of heads on the first flip?” if you want the answer to distinguish the two coins. For example, the probability of getting exactly _k_ heads in _n_ flips is C(_n_, _k_)(0.5)^_k_(0.5)^(_n_−_k_) for Coin 1, but (I think?) ∫₀¹ (1/√(0.02π))_e_^−((_p_−0.5)^2/0.02) C(_n_, _k_)(_p_)^_k_(_p_)^(_n_−_k_) _dp_ for Coin 2.
Suppose you think Coin 2 is biased and lands heads some unknown fraction r of the time. Your uncertainty about the parameter r will be represented by a probability distribution: say it’s normally distributed with a mean of 0.5 and a standard deviation of 0.1. The point is, the probability of r having a particular value is a different question from the the probability of getting heads on your first toss of Coin 2, which is still 0.5.
A standard approach is to use the beta distribution to represent your uncertainty over the value of r.
but you can do that using standard probability theory
Of course I can. I can represent my beliefs about the probability as a distribution, a meta- (or a hyper-) distribution. But I’m being told that this is “meta-uncertainty” which right-thinking Bayesians are not supposed to have.
No one is talking about inventing new fields of math
say it’s normally distributed
Clearly not since the normal distribution goes from negative infinity to positive infinity and the probability goes merely from 0 to 1.
the probability of r having a particular value is a different question from the the probability of getting heads on your first toss of Coin 2, which is still 0.5
That 0.5 is conditional on the distribution of r, isn’t it? That makes it not a different question at all.
Notably, if I’m risk-averse, the risk of betting on Coin 1 looks different to me from the risk of betting on Coin2.
(Attempted humorous allusion to how Cox’s theorem derives probability theory from simple axioms about how reasoning under uncertainty should work, less relevant if no one is talking about inventing new fields of math.)
It seems like you’ve come to an agreement, so let me ruin things by adding my own interpretation.
The coin has some propensity to come up heads. Say it will in the long run come up heads r of the time. The number r is like a probability in that it satisfies the mathematical rules of probability (in particular the rate at which the coin comes up heads plus the rate at which it comes up tails must sum to one). But it’s a physical property of the coin; not anything to do with our opinion of it. The number r is just some particular number based on the shape of the coin (and the way it’s being tossed), it doesn’t change with our knowledge of the coin. So r isn’t a “probability” in the Bayesian sense—a description of our knowledge—it’s just something out there in the world.
Now if we have some Bayesian agent who doesn’t know r, then the must have some probability distribution over it. It could also be uncertain about the weight, w, and have a probability distribution over w. The distribuiton over r isn’t “meta-uncertainty” because it’s a distribution over a real physical thing in the world, not over our own internal probability assignments. The probability distribution over r is conceptually the same as the one over w.
Now suppose someone is about to flip the coin again. If we knew for certain what the value of r was we would then assign that same value as the probability of the coin coming up heads. If we don’t know for certain what r is then we must therefore average over all values of r according to our distribution. The probability of the coin landing heads is its expected value, E(r).
Now E(r) actually is a Bayesian probability—it is our degree of belief that the coin will come up heads. This transformation from r being a physical property to E(r) being a probability is produced by the particular question that we are asking. If we had instead asked about the probability of the coin denting the floor then this would depend on the weight and would be expressed as E(f(w)) for some function f representing how probable it was that the floor got dented at each weight. We don’t need a similar f in the case of r because we were free to choose the units of r so that this was unnecessary. If we had instead let r be the average number of heads in 1000 flips then we would have to have calculated the probability as E(f(r)) using f(r)=r/1000.
But the distribution over r does give you the extra information you wanted to describe. Coin 1 would have an r distribution tightly clustered around 1⁄2, whereas our distribution for Coin 2 would be more spread out. But we would have E(r) = 1⁄2 in both cases. Then, when we see more flips of the coins, our distributions change (although our distribution for Coin 1 probably doesn’t change very much; we are already quite certain) and we might no longer have that E(r_1) = E(r_2).
But it’s a physical property of the coin; not anything to do with our opinion of it.
Well, coin + environment, but sure, you’re making the point that r is not a random variable in the underlying reality. That’s fine, if we climb the turtles all the way down we’d find a a philosophical debate about whether the universe is deterministic and that’s not quite what we are interested in right now.
The distribuiton over r isn’t “meta-uncertainty” because it’s a distribution over a real physical thing in the world
I don’t think describing r as a “real physical thing” is useful in this context.
For example, we treat the outcome of each coin flip as stochastic, but you can easily make an argument that it is not, being a “real physical thing” instead, driven by deterministic physics.
For another example, it’s easy to add more meta-levels. Consider Alice forming a probability distribution of what Bob believes the probability distribution of r is...
This transformation from r being a physical property to E(r) being a probability is produced by the particular question that we are asking.
Isn’t r itself “produced by the particular question that we are asking”?
But the distribution over r does give you the extra information you wanted to describe.
I’m mostly interested in prescriptive rationality, and vNM is the right starting point for that (with game theory being the right next step, and more beyond, leading to MIRI’s research among other things). If you want a good descriptive alternative to vNM, check out prospect theory.
Yes. There is nothing preventing you from assigning a value equal to -$1,000 to the state of affairs, “I made a bet and lost $100.” This would simply mean that you consider two situations equally valuable, for example one in which you have been robbed of $1,000, and another in which you made a bet and lost $100.
Assigning such values does nothing to prevent you from having a mathematically consistent utility function, and it does not imply any necessary violation of the VNM axioms.
Someone who has risk aversion in Lumifer’s sense might assign a value of -$2,000 to “I was robbed of $1,000 because I left my door unlocked,” but they will not assign that value to “I took all reasonable precautions and was robbed anyway.” The latter is considered not as bad.
Specifically, people assign a negative value to the thought, “If only I had taken such precautions I would not have suffered this loss.” If there are no precautions they could have taken, there will be no such regret. Even if there are some precautions, if they are unusual and expensive ones, the regret will be much less, if it exists at all.
Refusing a bet is naturally an obvious precaution, so losses that result from accepting bets will be assigned high negative values in this scheme.
The richer structure you seek for those two coins is your distribution over their probabilities. They’re both 50% likely to come up heads, given the information you have. You should be willing to make exactly the same bets about them, assuming the person offering you the bet has no more information than you do. However, if you flip each coin once and observe the results, your new probability estimate for next flips are now different.
For example, for the second coin you might have a uniform distribution (ignorance prior) over the set of all possible probabilities. In that case, if you observe a single flip that comes up heads, your probability that the next flip will be heads is now 2⁄3.
Bayesian rationality is also about decision making, not just beliefs. Usually people take it to mean expected utility maximization. Just assume my post said that instead.
My betting behavior w.r.t. the next coinflip is indeed the same for the two coins. My probability distributions over longer sequences of coinflips are different between the two coins. For example, P(10th flip is heads | first 9 are heads) is 1⁄2 for the first coin and close to 1 for the second coin. You can describe it as uncertainty over a hidden parameter, but you can make the same decisions without it, using only probabilities over sequences. The kind of meta-uncertainty you seem to want, that gets you out of uncomfortable bets, doesn’t exist for Bayesians.
You are just rearranging the problem without solving it. Can my utility function include risk aversion? If it can, we’re back to the square one: a risk-averse Bayesian rational agent.
And that’s even besides the observation that being Bayesian and being committed to expected utility maximization are orthogonal things.
I have no need for something that can get me out of uncomfortable bets since I’m perfectly fine with not betting at all. What I want is a representation for probability that is more rich than a simple scalar.
In my hypothetical the two 50% probabilites are different. I want to express the difference between them. There are no sequences involved.
That would be missing the point. The vNM theorem says that if you have preferences over “lotteries” (probability distributions over outcomes; like, 20% chance of winning $5 and 80% chance of winning $10) that satisfy the axioms, then your decisionmaking can be represented as maximizing expected utility for some utility function over outcomes. The concept of “risk aversion” is about how you react to uncertainty (how you decide between lotteries) and is embodied in the utility function; it doesn’t apply to outcomes known with certainty. (How risk-averse are you about winning $5?)
See “The Allais Paradox” for how this was covered in the vaunted Sequences.
Obviously you’re allowed to have different beliefs about Coin 1 and Coin 2, which could be expressed in many ways. But your different beliefs about the coins don’t need to show up in your probability for a single coinflip. The reason for mentioning sequences of flips, is because that’s when your beliefs about Coin 1 vs. Coin 2 would start making different predictions.
Would it? My interest is in constructing a framework which provides useful, insightful, and reasonably accurate models for actual human decision-making. The vNM theorem is quite useless in this respect—I don’t know what my (or other people’s) utility function is, I cannot calculate or even estimate it, a great deal of important choices can be expressed as a set of lotteries only in very awkward ways, etc. And this is even besides the fact that empirical human preferences tend to not be coherent and they change with time.
Risk aversion is an easily observable fact. Every day in financial markets people pay very large amounts of money in order to reduce their risk (for the same expected return). If you think they are all wrong, by all means, go and become rich off these misguided fools.
Why not? As I said, I want a richer way to talk about probabilities, more complex than taking them as simple scalars. Do you think it’s a bad idea? Does St.Bayes frown upon it?
That’s right, I think it’s a bad idea: it sounds like what you actually want is a richer way to talk about your beliefs about Coin 2, but you can do that using standard probability theory, without needing to invent a new field of math from scratch.
Suppose you think Coin 2 is biased and lands heads some unknown fraction _r_ of the time. Your uncertainty about the parameter _r_ will be represented by a probability distribution: say it’s normally distributed with a mean of 0.5 and a standard deviation of 0.1. The point is, the probability of _r_ having a particular value is a different question from the the probability of getting heads on your first toss of Coin 2, which is still 0.5. You’d have to ask a different question than “What is the probability of heads on the first flip?” if you want the answer to distinguish the two coins. For example, the probability of getting exactly _k_ heads in _n_ flips is C(_n_, _k_)(0.5)^_k_(0.5)^(_n_−_k_) for Coin 1, but (I think?) ∫₀¹ (1/√(0.02π))_e_^−((_p_−0.5)^2/0.02) C(_n_, _k_)(_p_)^_k_(_p_)^(_n_−_k_) _dp_ for Coin 2.
St. Cox probably does.
A standard approach is to use the beta distribution to represent your uncertainty over the value of r.
Of course I can. I can represent my beliefs about the probability as a distribution, a meta- (or a hyper-) distribution. But I’m being told that this is “meta-uncertainty” which right-thinking Bayesians are not supposed to have.
No one is talking about inventing new fields of math
Clearly not since the normal distribution goes from negative infinity to positive infinity and the probability goes merely from 0 to 1.
That 0.5 is conditional on the distribution of r, isn’t it? That makes it not a different question at all.
Notably, if I’m risk-averse, the risk of betting on Coin 1 looks different to me from the risk of betting on Coin2.
Can you elaborate? It’s not clear to me.
Hm. Maybe those people are wrong??
That’s right; I should have either said “approximately”, or chosen a different distribution.
Yes, it is averaging over your distribution for _r_. Does it help if you think of probability as relative to subjective states of knowledge?
(Attempted humorous allusion to how Cox’s theorem derives probability theory from simple axioms about how reasoning under uncertainty should work, less relevant if no one is talking about inventing new fields of math.)
Nope.
That’s what I thought, too, and that disagreement led to this subthread.
But if we both say that we can easily talk about distributions of probabilities, we’re probably in agreement :-)
It seems like you’ve come to an agreement, so let me ruin things by adding my own interpretation.
The coin has some propensity to come up heads. Say it will in the long run come up heads r of the time. The number r is like a probability in that it satisfies the mathematical rules of probability (in particular the rate at which the coin comes up heads plus the rate at which it comes up tails must sum to one). But it’s a physical property of the coin; not anything to do with our opinion of it. The number r is just some particular number based on the shape of the coin (and the way it’s being tossed), it doesn’t change with our knowledge of the coin. So r isn’t a “probability” in the Bayesian sense—a description of our knowledge—it’s just something out there in the world.
Now if we have some Bayesian agent who doesn’t know r, then the must have some probability distribution over it. It could also be uncertain about the weight, w, and have a probability distribution over w. The distribuiton over r isn’t “meta-uncertainty” because it’s a distribution over a real physical thing in the world, not over our own internal probability assignments. The probability distribution over r is conceptually the same as the one over w.
Now suppose someone is about to flip the coin again. If we knew for certain what the value of r was we would then assign that same value as the probability of the coin coming up heads. If we don’t know for certain what r is then we must therefore average over all values of r according to our distribution. The probability of the coin landing heads is its expected value, E(r).
Now E(r) actually is a Bayesian probability—it is our degree of belief that the coin will come up heads. This transformation from r being a physical property to E(r) being a probability is produced by the particular question that we are asking. If we had instead asked about the probability of the coin denting the floor then this would depend on the weight and would be expressed as E(f(w)) for some function f representing how probable it was that the floor got dented at each weight. We don’t need a similar f in the case of r because we were free to choose the units of r so that this was unnecessary. If we had instead let r be the average number of heads in 1000 flips then we would have to have calculated the probability as E(f(r)) using f(r)=r/1000.
But the distribution over r does give you the extra information you wanted to describe. Coin 1 would have an r distribution tightly clustered around 1⁄2, whereas our distribution for Coin 2 would be more spread out. But we would have E(r) = 1⁄2 in both cases. Then, when we see more flips of the coins, our distributions change (although our distribution for Coin 1 probably doesn’t change very much; we are already quite certain) and we might no longer have that E(r_1) = E(r_2).
Well, coin + environment, but sure, you’re making the point that r is not a random variable in the underlying reality. That’s fine, if we climb the turtles all the way down we’d find a a philosophical debate about whether the universe is deterministic and that’s not quite what we are interested in right now.
I don’t think describing r as a “real physical thing” is useful in this context.
For example, we treat the outcome of each coin flip as stochastic, but you can easily make an argument that it is not, being a “real physical thing” instead, driven by deterministic physics.
For another example, it’s easy to add more meta-levels. Consider Alice forming a probability distribution of what Bob believes the probability distribution of r is...
Isn’t r itself “produced by the particular question that we are asking”?
Yes.
I’m mostly interested in prescriptive rationality, and vNM is the right starting point for that (with game theory being the right next step, and more beyond, leading to MIRI’s research among other things). If you want a good descriptive alternative to vNM, check out prospect theory.
Yes. There is nothing preventing you from assigning a value equal to -$1,000 to the state of affairs, “I made a bet and lost $100.” This would simply mean that you consider two situations equally valuable, for example one in which you have been robbed of $1,000, and another in which you made a bet and lost $100.
Assigning such values does nothing to prevent you from having a mathematically consistent utility function, and it does not imply any necessary violation of the VNM axioms.
That doesn’t follow, since there’s also nothing preventing you from assigning a value equal to $-2000 to the state of affairs “I was robbed of $1000”.
Someone who has risk aversion in Lumifer’s sense might assign a value of -$2,000 to “I was robbed of $1,000 because I left my door unlocked,” but they will not assign that value to “I took all reasonable precautions and was robbed anyway.” The latter is considered not as bad.
Specifically, people assign a negative value to the thought, “If only I had taken such precautions I would not have suffered this loss.” If there are no precautions they could have taken, there will be no such regret. Even if there are some precautions, if they are unusual and expensive ones, the regret will be much less, if it exists at all.
Refusing a bet is naturally an obvious precaution, so losses that result from accepting bets will be assigned high negative values in this scheme.
The richer structure you seek for those two coins is your distribution over their probabilities. They’re both 50% likely to come up heads, given the information you have. You should be willing to make exactly the same bets about them, assuming the person offering you the bet has no more information than you do. However, if you flip each coin once and observe the results, your new probability estimate for next flips are now different.
For example, for the second coin you might have a uniform distribution (ignorance prior) over the set of all possible probabilities. In that case, if you observe a single flip that comes up heads, your probability that the next flip will be heads is now 2⁄3.
Yes, I understand that. This subthread started when cousin_it said
at which point I objected.