Ordinary disagreements persist after hearing others’ estimates. A and B may start out asserting “50” and “10“, and then argue their way to “25” and “12”, then “23” and “17″. But if you want each estimate to be as accurate as possible, this is silly behavior; if A can predict that his estimate will go down over time (as he integrates more of B’s evidence), he can also predict that his current estimate is too high—and so he can improve his accuracy by lowering his estimate right now. The two parties should be as likely to overshoot as to undershoot in their disagreements, e.g.:
A: 50; B: 10
A: 18; B: 22
A: 21; B: 21.
So next time you’re in a dispute, try applying Principle 3: ask what an outside observer would say about the situation. If Alfred and Betty both apply this principle, they’ll each ask: “What would an outside observer guess about Lake L, given that Betty has studied geography and said “10”, while Alfred said “50″?” And, thus viewing the situation from the (same) outside, Betty and Alfred will both weigh Betty’s evidence about equally. Alfred may underweight Betty’s impression (e.g., because he doesn’t realize she wrote her thesis on Lake L) -- but he may equally overweight Betty’s opinion (e.g., because he doesn’t realize that she’s never heard of Lake L either). If he could predict that he was (over/under) weighting her opinion, he’d quit doing it.
More precisely: if you and your interlocutor can predict your direction of disagreement, at least one of you is forming needlessly inaccurate estimates.
Before I read your reply, I assume that Alfred will lower his estimate a lot, and Betty might raise her estimate a little. I expect Betty’s estimate to still be lower than Alfred’s, though the size of these effects would be dependent on how much more geography Betty knows than Alfred.
After reading your reply, I think you’re right about convergence, and definitely right about driving your answer towards what you think is correct as fast as possible rather than holding back for fear of seeming to give in.
It’s an interesting problem, and you’re not doing it justice.
A and B have a prior based on certain evidence. Their first guess conveys only the mean of that prior. You also posit that they have a shared belief about the (expected) amount of evidence behind their prior.
To update at each iteration, they need to infer what evidence about the world is behind the exchange of guesses so far.
I don’t agree with anything you’ve claimed about this scenario. I’ll grant you any simplifying assumptions you need to prove it, but let’s be clear about what those assumptions are.
If they’re only similarly rational rather than perfectly rational, they’ll probably both be biased toward their own estimates. It also depends on common knowledge assumptions. As far as I know two people can be perfectly rational, and both can think the other is irrational, or think the other is rational but thinks they’re irrational and therefore won’t update, and therefore not get to an equilibrium. So I would disagree with your statement that:
if you and your interlocutor can predict your direction of disagreement, at least one of you is forming needlessly inaccurate estimates
In general, the insights needed to answer the questions at the end of the post go beyond what one can learn from the ultra-simple “everyone can see the same evidence” example at the start of the post, I think.
Re: Problem 2:
Take an even probability distribution involving your feelings and your roommate’s feelings on housework (and on who’s emotionally biased). You have no reason to treat your and your roommate’s feelings as asymmetrically indicative (unless unbiased indicators have told you that you’re especially above- or below- average at this sort of thing). It’s like the thermometers, again.
Re: Problem 3:
Keep your belief in atheism. Your evidence against a Christian god is way stronger than any evidence provided by your roommate’s assertion. Despite the superficial symmetry with Problem 2, the prior against the complex hypothesis of a Christian god is many orders of magnitude stronger than the prior against you being wilfully mistaken about the housework—and these orders of magnitude matter.
(Though note that this reasoning only works because such “extraordinary claims” are routinely made without extraordinary evidence; psychology and anthropology indicate that p( your roommate’s assertion | no Christian god) is relatively large—much larger than a simplicity prior would assign to p(Christian god), or p(flying spaghetti monster).
No, problems 2 and 3 are symmetrical in a more than superficial way. In both cases, the proper course of action is to attempt to conduct an unbiased evaluation of the evidence and of the biases affecting each of you. The difference is, in problem 3, we have already encountered and evaluated numerous nearly identical situations, so it is easy to come to the proper decision, whereas in problem 2, the situation could be new and unique, and missing background information about the effects of bias on the two individuals and the accuracy of their predictions becomes important.
The description of both problem 2 and 3 indicates a possible biasing in both participants. Its therefore reasonable to cool down first, and then check the evidence.
In problem 3 roommate might point out valid criticisms about biases one might have, while still being wrong on the question itself. Either way its not rational to argue when in heat.
Problem 2: Given the stated conditions (“you feel strongly that you could never have such biases” is unlikely in my case, but taking it as fact), I would tentatively interpret my roommates remarks as indicating his frustration rather than my disposition. However, I would take the probability of being mistaken as high enough that I would attempt to find some way to defuse the situation that would work either way—most likely, arbitration from a mutually trusted party.
Problem 3: I would quickly review what I know about the debate, and conclude that I have received no additional evidence one way or the other. I would continue to be confident in my naturalist worldview.
After reading your answers:
Problem 2: I notice that you interpret “you feel strongly that you could never have such biases” differently to how I interpret it—I would not feel thus without an observed track record of myself supporting that conclusion. My actions are scarcely changed from those implied by your judgement, however.
Problem 2: I’d work on finding out what criteria we were using. In general, I believe that I can tell when I’m going off balance. I’m not sure if I can test this, but I get the impression that most people have no clue at all about when they’re going off balance. I will also note that even if I feel I’m going off balance, there may not be anything I can do about it in the short run.
Problem 3: I’m an agnostic, not an atheist. That being said, I would notice that the Christian is using a circular system of proof, and not agree with them.
Re: problem 1: Jelly bean number estimates are just like thermometer readings, except that the reading is in someone’s head, rather than their hand. So the obvious answer is to average everyone’s initial, solitary impressions, absent reason to expect one individual or another is an above-average (or below-average) estimator.
If your friends use lopsided weighting schemes in their second answers, should you re-update? This depends a lot on your friends.
Don’t re-update from their answers if you think they don’t understand the merits of averaging; you want to weight each person’s raw impression evenly, not to overweight it based on how many others were randomly influenced by it (cf. information cascades: http://en.wikipedia.org/wiki/Information_cascade).
Do re-update if your friends understand the merits of averaging, such that their apparent over-weighting of a few peoples’ datapoints suggests they know something you don’t (e.g., perhaps your friend Julie has won past championships in jelly-bean estimation, and everyone but you knows it).
Since I know those people, I would weight their answers according to my best estimate of their skill at such tasks, and then average the whole group, including me.
Since I know those people, I would weight their answers according to my best estimate of their skill at such tasks, and then average the whole group, including me.
Doing this correctly can get pretty complicated. Basically, the more people you have, the less you should weight the low-quality estimates compared to the high-quality estimates.
For example, suppose that “good” thermometers are unbiased and “bad” thermometers are all biased in the same direction, but you don’t know which direction.
If you have one thermometer which you know is good, and one which you’re 95% sure is good, then you should weight both measurements about the same.
But if you have 10^6 thermometers which you know are good, and 10^6 which you’re 95% sure are good, then you should pretty much ignore the possibly-bad ones.
Before reading your answer: Human beings are bad at estimating volumes, as opposed to lengths. I would form my estimate by observing the apparent density of jellybean in the jar (e.g. by examining a square centimeter cross-section), observing the dimensions, and multiplying. Then, on the second stage, I would discard estimates which are radically different from mine (cutoff to be chosen based on observed distribution), and take the mean of the remaining. I would allow myself to be influenced in my choice of data to include by those whose data I was already inclined to include in my average.
After reading your answer: Should I notice an apparent and popular upweighting of certain responses such as you suggest, I would increase the weight of those in my average.
I would look for response clusters. Each participant could have a different counting method rendering different results (e.g. - estimate volumes/ count radius & height/ estimate there’s an empty cone at the top which you don’t see), and some methods could be common pitfalls. Therefore, some results—those obtained by a wrong way of counting, should be discarded, otherwise the median result would lead away from the right result. In order to decide which is the right response cluster, trying to figure out each method/mistake and determining the correct one would be useful. Of course, your method is not necessarily the right one, just because it’s yours.
My take on how to get the best estimates, in separate comments for tidier discussion threads:
Re: Problem 4: Roughly speaking: yes.
Ordinary disagreements persist after hearing others’ estimates. A and B may start out asserting “50” and “10“, and then argue their way to “25” and “12”, then “23” and “17″. But if you want each estimate to be as accurate as possible, this is silly behavior; if A can predict that his estimate will go down over time (as he integrates more of B’s evidence), he can also predict that his current estimate is too high—and so he can improve his accuracy by lowering his estimate right now. The two parties should be as likely to overshoot as to undershoot in their disagreements, e.g.: A: 50; B: 10 A: 18; B: 22 A: 21; B: 21.
So next time you’re in a dispute, try applying Principle 3: ask what an outside observer would say about the situation. If Alfred and Betty both apply this principle, they’ll each ask: “What would an outside observer guess about Lake L, given that Betty has studied geography and said “10”, while Alfred said “50″?” And, thus viewing the situation from the (same) outside, Betty and Alfred will both weigh Betty’s evidence about equally. Alfred may underweight Betty’s impression (e.g., because he doesn’t realize she wrote her thesis on Lake L) -- but he may equally overweight Betty’s opinion (e.g., because he doesn’t realize that she’s never heard of Lake L either). If he could predict that he was (over/under) weighting her opinion, he’d quit doing it.
More precisely: if you and your interlocutor can predict your direction of disagreement, at least one of you is forming needlessly inaccurate estimates.
Before I read your reply, I assume that Alfred will lower his estimate a lot, and Betty might raise her estimate a little. I expect Betty’s estimate to still be lower than Alfred’s, though the size of these effects would be dependent on how much more geography Betty knows than Alfred.
After reading your reply, I think you’re right about convergence, and definitely right about driving your answer towards what you think is correct as fast as possible rather than holding back for fear of seeming to give in.
It’s an interesting problem, and you’re not doing it justice.
A and B have a prior based on certain evidence. Their first guess conveys only the mean of that prior. You also posit that they have a shared belief about the (expected) amount of evidence behind their prior.
To update at each iteration, they need to infer what evidence about the world is behind the exchange of guesses so far.
I don’t agree with anything you’ve claimed about this scenario. I’ll grant you any simplifying assumptions you need to prove it, but let’s be clear about what those assumptions are.
If they’re only similarly rational rather than perfectly rational, they’ll probably both be biased toward their own estimates. It also depends on common knowledge assumptions. As far as I know two people can be perfectly rational, and both can think the other is irrational, or think the other is rational but thinks they’re irrational and therefore won’t update, and therefore not get to an equilibrium. So I would disagree with your statement that:
In general, the insights needed to answer the questions at the end of the post go beyond what one can learn from the ultra-simple “everyone can see the same evidence” example at the start of the post, I think.
Re: Problem 2: Take an even probability distribution involving your feelings and your roommate’s feelings on housework (and on who’s emotionally biased). You have no reason to treat your and your roommate’s feelings as asymmetrically indicative (unless unbiased indicators have told you that you’re especially above- or below- average at this sort of thing). It’s like the thermometers, again.
Re: Problem 3: Keep your belief in atheism. Your evidence against a Christian god is way stronger than any evidence provided by your roommate’s assertion. Despite the superficial symmetry with Problem 2, the prior against the complex hypothesis of a Christian god is many orders of magnitude stronger than the prior against you being wilfully mistaken about the housework—and these orders of magnitude matter.
(Though note that this reasoning only works because such “extraordinary claims” are routinely made without extraordinary evidence; psychology and anthropology indicate that p( your roommate’s assertion | no Christian god) is relatively large—much larger than a simplicity prior would assign to p(Christian god), or p(flying spaghetti monster).
No, problems 2 and 3 are symmetrical in a more than superficial way. In both cases, the proper course of action is to attempt to conduct an unbiased evaluation of the evidence and of the biases affecting each of you. The difference is, in problem 3, we have already encountered and evaluated numerous nearly identical situations, so it is easy to come to the proper decision, whereas in problem 2, the situation could be new and unique, and missing background information about the effects of bias on the two individuals and the accuracy of their predictions becomes important.
The description of both problem 2 and 3 indicates a possible biasing in both participants. Its therefore reasonable to cool down first, and then check the evidence.
In problem 3 roommate might point out valid criticisms about biases one might have, while still being wrong on the question itself. Either way its not rational to argue when in heat.
Before reading your answers:
Problem 2: Given the stated conditions (“you feel strongly that you could never have such biases” is unlikely in my case, but taking it as fact), I would tentatively interpret my roommates remarks as indicating his frustration rather than my disposition. However, I would take the probability of being mistaken as high enough that I would attempt to find some way to defuse the situation that would work either way—most likely, arbitration from a mutually trusted party.
Problem 3: I would quickly review what I know about the debate, and conclude that I have received no additional evidence one way or the other. I would continue to be confident in my naturalist worldview.
After reading your answers:
Problem 2: I notice that you interpret “you feel strongly that you could never have such biases” differently to how I interpret it—I would not feel thus without an observed track record of myself supporting that conclusion. My actions are scarcely changed from those implied by your judgement, however.
Problem 2: I’d work on finding out what criteria we were using. In general, I believe that I can tell when I’m going off balance. I’m not sure if I can test this, but I get the impression that most people have no clue at all about when they’re going off balance. I will also note that even if I feel I’m going off balance, there may not be anything I can do about it in the short run.
Problem 3: I’m an agnostic, not an atheist. That being said, I would notice that the Christian is using a circular system of proof, and not agree with them.
Re: problem 1: Jelly bean number estimates are just like thermometer readings, except that the reading is in someone’s head, rather than their hand. So the obvious answer is to average everyone’s initial, solitary impressions, absent reason to expect one individual or another is an above-average (or below-average) estimator.
If your friends use lopsided weighting schemes in their second answers, should you re-update? This depends a lot on your friends.
Don’t re-update from their answers if you think they don’t understand the merits of averaging; you want to weight each person’s raw impression evenly, not to overweight it based on how many others were randomly influenced by it (cf. information cascades: http://en.wikipedia.org/wiki/Information_cascade).
Do re-update if your friends understand the merits of averaging, such that their apparent over-weighting of a few peoples’ datapoints suggests they know something you don’t (e.g., perhaps your friend Julie has won past championships in jelly-bean estimation, and everyone but you knows it).
Since I know those people, I would weight their answers according to my best estimate of their skill at such tasks, and then average the whole group, including me.
Doing this correctly can get pretty complicated. Basically, the more people you have, the less you should weight the low-quality estimates compared to the high-quality estimates.
For example, suppose that “good” thermometers are unbiased and “bad” thermometers are all biased in the same direction, but you don’t know which direction.
If you have one thermometer which you know is good, and one which you’re 95% sure is good, then you should weight both measurements about the same.
But if you have 10^6 thermometers which you know are good, and 10^6 which you’re 95% sure are good, then you should pretty much ignore the possibly-bad ones.
Not that it matters tremendously, but I was thinking of the jelly bean problem.
What kind of weighted average?
My math isn’t good enough to formalize it—I’d do it by feel.
Drat—likewise.
Before reading your answer: Human beings are bad at estimating volumes, as opposed to lengths. I would form my estimate by observing the apparent density of jellybean in the jar (e.g. by examining a square centimeter cross-section), observing the dimensions, and multiplying. Then, on the second stage, I would discard estimates which are radically different from mine (cutoff to be chosen based on observed distribution), and take the mean of the remaining. I would allow myself to be influenced in my choice of data to include by those whose data I was already inclined to include in my average.
After reading your answer: Should I notice an apparent and popular upweighting of certain responses such as you suggest, I would increase the weight of those in my average.
I would look for response clusters. Each participant could have a different counting method rendering different results (e.g. - estimate volumes/ count radius & height/ estimate there’s an empty cone at the top which you don’t see), and some methods could be common pitfalls. Therefore, some results—those obtained by a wrong way of counting, should be discarded, otherwise the median result would lead away from the right result. In order to decide which is the right response cluster, trying to figure out each method/mistake and determining the correct one would be useful. Of course, your method is not necessarily the right one, just because it’s yours.