cousin_it comments on Bet or update: fixing the will-to-wager assumption

cousin_it 7 Jun 2017 18:32 UTC
5 points
Bayesian rationality is also about decision making, not just beliefs. Usually people take it to mean expected utility maximization. Just assume my post said that instead.

My betting behavior w.r.t. the next coinflip is indeed the same for the two coins. My probability distributions over longer sequences of coinflips are different between the two coins. For example, P(10th flip is heads | first 9 are heads) is ¹⁄₂ for the first coin and close to 1 for the second coin. You can describe it as uncertainty over a hidden parameter, but you can make the same decisions without it, using only probabilities over sequences. The kind of meta-uncertainty you seem to want, that gets you out of uncomfortable bets, doesn’t exist for Bayesians.
- Lumifer 7 Jun 2017 18:56 UTC
  0 points
  Parent
  
  expected utility maximization
  
  You are just rearranging the problem without solving it. Can my utility function include risk aversion? If it can, we’re back to the square one: a risk-averse Bayesian rational agent.
  
  And that’s even besides the observation that being Bayesian and being committed to expected utility maximization are orthogonal things.
  
  The kind of meta-uncertainty you seem to want, that gets you out of uncomfortable bets, doesn’t exist for Bayesians.
  
  I have no need for something that can get me out of uncomfortable bets since I’m perfectly fine with not betting at all. What I want is a representation for probability that is more rich than a simple scalar.
  
  In my hypothetical the two 50% probabilites are different. I want to express the difference between them. There are no sequences involved.
  - Zack_M_Davis 7 Jun 2017 19:49 UTC
    8 points
    Parent
    
    Can my utility function include risk aversion?
    
    That would be missing the point. The vNM theorem says that if you have preferences over “lotteries” (probability distributions over outcomes; like, 20% chance of winning $5 and 80% chance of winning $10) that satisfy the axioms, then your decisionmaking can be represented as maximizing expected utility for some utility function over outcomes. The concept of “risk aversion” is about how you react to uncertainty (how you decide between lotteries) and is embodied in the utility function; it doesn’t apply to outcomes known with certainty. (How risk-averse are you about winning $5?)
    
    See “The Allais Paradox” for how this was covered in the vaunted Sequences.
    
    In my hypothetical the two 50% probabilites are different. I want to express the difference between them. There are no sequences involved.
    
    Obviously you’re allowed to have different beliefs about Coin 1 and Coin 2, which could be expressed in many ways. But your different beliefs about the coins don’t need to show up in your probability for a single coinflip. The reason for mentioning sequences of flips, is because that’s when your beliefs about Coin 1 vs. Coin 2 would start making different predictions.
    - Lumifer 7 Jun 2017 20:24 UTC
      0 points
      Parent
      
      That would be missing the point.
      
      Would it? My interest is in constructing a framework which provides useful, insightful, and reasonably accurate models for actual human decision-making. The vNM theorem is quite useless in this respect—I don’t know what my (or other people’s) utility function is, I cannot calculate or even estimate it, a great deal of important choices can be expressed as a set of lotteries only in very awkward ways, etc. And this is even besides the fact that empirical human preferences tend to not be coherent and they change with time.
      
      Risk aversion is an easily observable fact. Every day in financial markets people pay very large amounts of money in order to reduce their risk (for the same expected return). If you think they are all wrong, by all means, go and become rich off these misguided fools.
      
      But your different beliefs about the coins don’t need to show up in your probability for a single coinflip.
      
      Why not? As I said, I want a richer way to talk about probabilities, more complex than taking them as simple scalars. Do you think it’s a bad idea? Does St.Bayes frown upon it?
      - Zack_M_Davis 7 Jun 2017 23:45 UTC
        7 points
        Parent
        
        As I said, I want a richer way to talk about probabilities, more complex than taking them as simple scalars. Do you think it’s a bad idea?
        
        That’s right, I think it’s a bad idea: it sounds like what you actually want is a richer way to talk about your beliefs about Coin 2, but you can do that using standard probability theory, without needing to invent a new field of math from scratch.
        
        Suppose you think Coin 2 is biased and lands heads some unknown fraction _r_ of the time. Your uncertainty about the parameter _r_ will be represented by a probability distribution: say it’s normally distributed with a mean of 0.5 and a standard deviation of 0.1. The point is, the probability of _r_ having a particular value is a different question from the the probability of getting heads on your first toss of Coin 2, which is still 0.5. You’d have to ask a different question than “What is the probability of heads on the first flip?” if you want the answer to distinguish the two coins. For example, the probability of getting exactly _k_ heads in _n_ flips is C(_n_, _k_)(0.5)^_k_(0.5)^(_n_−_k_) for Coin 1, but (I think?) ∫₀¹ (1/√(0.02π))_e_^−((_p_−0.5)^2/0.02) C(_n_, _k_)(_p_)^_k_(_p_)^(_n_−_k_) _dp_ for Coin 2.
        
        Does St.Bayes frown upon it?
        
        St. Cox probably does.
        Unnamed 8 Jun 2017 0:43 UTC
        5 points
        Parent
        
        Suppose you think Coin 2 is biased and lands heads some unknown fraction r of the time. Your uncertainty about the parameter r will be represented by a probability distribution: say it’s normally distributed with a mean of 0.5 and a standard deviation of 0.1. The point is, the probability of r having a particular value is a different question from the the probability of getting heads on your first toss of Coin 2, which is still 0.5.
        
        A standard approach is to use the beta distribution to represent your uncertainty over the value of r.
        Lumifer 8 Jun 2017 1:13 UTC
        0 points
        Parent
        
        but you can do that using standard probability theory
        
        Of course I can. I can represent my beliefs about the probability as a distribution, a meta- (or a hyper-) distribution. But I’m being told that this is “meta-uncertainty” which right-thinking Bayesians are not supposed to have.
        
        No one is talking about inventing new fields of math
        
        say it’s normally distributed
        
        Clearly not since the normal distribution goes from negative infinity to positive infinity and the probability goes merely from 0 to 1.
        
        the probability of r having a particular value is a different question from the the probability of getting heads on your first toss of Coin 2, which is still 0.5
        
        That 0.5 is conditional on the distribution of r, isn’t it? That makes it not a different question at all.
        
        Notably, if I’m risk-averse, the risk of betting on Coin 1 looks different to me from the risk of betting on Coin2.
        
        St. Cox probably does.
        
        Can you elaborate? It’s not clear to me.
        Zack_M_Davis 8 Jun 2017 3:19 UTC
        0 points
        Parent
        
        But I’m being told that this is “meta-uncertainty” which right-thinking Bayesians are not supposed to have.
        
        Hm. Maybe those people are wrong??
        
        Clearly not since the normal distribution goes from negative infinity to positive infinity
        
        That’s right; I should have either said “approximately”, or chosen a different distribution.
        
        That 0.5 is conditional on the distribution of r, isn’t it? That makes it not a different question at all.
        
        Yes, it is averaging over your distribution for _r_. Does it help if you think of probability as relative to subjective states of knowledge?
        
        Can you elaborate?
        
        (Attempted humorous allusion to how Cox’s theorem derives probability theory from simple axioms about how reasoning under uncertainty should work, less relevant if no one is talking about inventing new fields of math.)
        Douglas_Knight 8 Jun 2017 21:42 UTC
        0 points
        Parent
        
        But I’m being told that this is “meta-uncertainty” which right-thinking Bayesians are not supposed to have.
        
        Hm. Maybe those people are wrong??
        
        Nope.
        Lumifer 8 Jun 2017 4:29 UTC
        0 points
        Parent
        
        Maybe those people are wrong?
        
        That’s what I thought, too, and that disagreement led to this subthread.
        
        But if we both say that we can easily talk about distributions of probabilities, we’re probably in agreement :-)
        Oscar_Cunningham 8 Jun 2017 9:17 UTC
        2 points
        Parent
        It seems like you’ve come to an agreement, so let me ruin things by adding my own interpretation.
        
        The coin has some propensity to come up heads. Say it will in the long run come up heads r of the time. The number r is like a probability in that it satisfies the mathematical rules of probability (in particular the rate at which the coin comes up heads plus the rate at which it comes up tails must sum to one). But it’s a physical property of the coin; not anything to do with our opinion of it. The number r is just some particular number based on the shape of the coin (and the way it’s being tossed), it doesn’t change with our knowledge of the coin. So r isn’t a “probability” in the Bayesian sense—a description of our knowledge—it’s just something out there in the world.
        
        Now if we have some Bayesian agent who doesn’t know r, then the must have some probability distribution over it. It could also be uncertain about the weight, w, and have a probability distribution over w. The distribuiton over r isn’t “meta-uncertainty” because it’s a distribution over a real physical thing in the world, not over our own internal probability assignments. The probability distribution over r is conceptually the same as the one over w.
        
        Now suppose someone is about to flip the coin again. If we knew for certain what the value of r was we would then assign that same value as the probability of the coin coming up heads. If we don’t know for certain what r is then we must therefore average over all values of r according to our distribution. The probability of the coin landing heads is its expected value, E(r).
        
        Now E(r) actually is a Bayesian probability—it is our degree of belief that the coin will come up heads. This transformation from r being a physical property to E(r) being a probability is produced by the particular question that we are asking. If we had instead asked about the probability of the coin denting the floor then this would depend on the weight and would be expressed as E(f(w)) for some function f representing how probable it was that the floor got dented at each weight. We don’t need a similar f in the case of r because we were free to choose the units of r so that this was unnecessary. If we had instead let r be the average number of heads in 1000 flips then we would have to have calculated the probability as E(f(r)) using f(r)=r/1000.
        
        But the distribution over r does give you the extra information you wanted to describe. Coin 1 would have an r distribution tightly clustered around ¹⁄₂, whereas our distribution for Coin 2 would be more spread out. But we would have E(r) = ¹⁄₂ in both cases. Then, when we see more flips of the coins, our distributions change (although our distribution for Coin 1 probably doesn’t change very much; we are already quite certain) and we might no longer have that E(r_1) = E(r_2).
        Lumifer 8 Jun 2017 15:00 UTC
        0 points
        Parent
        
        But it’s a physical property of the coin; not anything to do with our opinion of it.
        
        Well, coin + environment, but sure, you’re making the point that r is not a random variable in the underlying reality. That’s fine, if we climb the turtles all the way down we’d find a a philosophical debate about whether the universe is deterministic and that’s not quite what we are interested in right now.
        
        The distribuiton over r isn’t “meta-uncertainty” because it’s a distribution over a real physical thing in the world
        
        I don’t think describing r as a “real physical thing” is useful in this context.
        
        For example, we treat the outcome of each coin flip as stochastic, but you can easily make an argument that it is not, being a “real physical thing” instead, driven by deterministic physics.
        
        For another example, it’s easy to add more meta-levels. Consider Alice forming a probability distribution of what Bob believes the probability distribution of r is...
        
        This transformation from r being a physical property to E(r) being a probability is produced by the particular question that we are asking.
        
        Isn’t r itself “produced by the particular question that we are asking”?
        
        But the distribution over r does give you the extra information you wanted to describe.
        
        Yes.
      - cousin_it 7 Jun 2017 22:21 UTC
        0 points
        Parent
        I’m mostly interested in prescriptive rationality, and vNM is the right starting point for that (with game theory being the right next step, and more beyond, leading to MIRI’s research among other things). If you want a good descriptive alternative to vNM, check out prospect theory.
  - entirelyuseless 8 Jun 2017 3:30 UTC
    3 points
    Parent
    
    Can my utility function include risk aversion?
    
    Yes. There is nothing preventing you from assigning a value equal to -$1,000 to the state of affairs, “I made a bet and lost $100.” This would simply mean that you consider two situations equally valuable, for example one in which you have been robbed of $1,000, and another in which you made a bet and lost $100.
    
    Assigning such values does nothing to prevent you from having a mathematically consistent utility function, and it does not imply any necessary violation of the VNM axioms.
    - Jiro 17 Jun 2017 3:39 UTC
      2 points
      Parent
      That doesn’t follow, since there’s also nothing preventing you from assigning a value equal to $-2000 to the state of affairs “I was robbed of $1000”.
      - entirelyuseless 17 Jun 2017 13:50 UTC
        0 points
        Parent
        Someone who has risk aversion in Lumifer’s sense might assign a value of -$2,000 to “I was robbed of $1,000 because I left my door unlocked,” but they will not assign that value to “I took all reasonable precautions and was robbed anyway.” The latter is considered not as bad.
        
        Specifically, people assign a negative value to the thought, “If only I had taken such precautions I would not have suffered this loss.” If there are no precautions they could have taken, there will be no such regret. Even if there are some precautions, if they are unusual and expensive ones, the regret will be much less, if it exists at all.
        
        Refusing a bet is naturally an obvious precaution, so losses that result from accepting bets will be assigned high negative values in this scheme.
  - evand 8 Jun 2017 14:24 UTC
    0 points
    Parent
    The richer structure you seek for those two coins is your distribution over their probabilities. They’re both 50% likely to come up heads, given the information you have. You should be willing to make exactly the same bets about them, assuming the person offering you the bet has no more information than you do. However, if you flip each coin once and observe the results, your new probability estimate for next flips are now different.
    
    For example, for the second coin you might have a uniform distribution (ignorance prior) over the set of all possible probabilities. In that case, if you observe a single flip that comes up heads, your probability that the next flip will be heads is now ²⁄₃.
    - Lumifer 8 Jun 2017 15:04 UTC
      0 points
      Parent
      
      The richer structure you seek for those two coins is your distribution over their probabilities.
      
      Yes, I understand that. This subthread started when cousin_it said
      
      Bayesians don’t believe in meta-uncertainty
      
      at which point I objected.