This is a crucial observation if you are trying to use this technique to improve your calibration of your own accuracy! You can’t just start making bets when no one else you associate regularly is challenging you to the bets.
Several years ago, I started taking note of all of the times I disagreed with other people and looking it up, but initially, I only counted myself as having “disagreed with other people” if they said something I thought was wrong, and I attempted to correct them. Then I soon added in the case when they corrected me and I argued back. During this period of time, I went from thinking I was about 90% accurate in my claims to believing I was way more accurate than that. I would go months without being wrong, and this was in college, so I was frequently getting into disagreements with people, probably, an average, three a day during the school year. Then I started checking the times that other people corrected me, just as much as I checked when I corrected other people. (Counting even the times that I made no attempt to argue.) And my accuracy rate plummeted.
Another thing I would recommend to people starting out in doing this is that you should keep track of your record with individual people not just your general overall record. My accuracy rate with a few people is way lower than my overall accuracy rate. My overall rate is higher than it should be because I know a few argumentative people who are frequently wrong. (This would probably change if we were actually betting money, and we were only counting arguments when those people were willing to bet. So you’re approach adjusts for this better than mine.) I have several people for whom I’m close to 50%, and there are two people for whom I have several data points and my overall accuracy is below 50%.
There’s one other point I think somebody needs to make about calibration. And that’s that 75% accuracy when you disagree with other people is not the same thing as 75% accuracy. 75% information fidelity is atrocious; 95% information fidelity is not much better. Human brains are very defective in a lot of ways, but they aren’t that defective! Except at doing math. Brains are ridiculously bad at math relative to how easily machines can be implemented to be good at math. For most intents and purposes, 99% isn’t a very high percentage. I am not a particular good driver, but I haven’t gotten into a collision with another vehicle in my well over 1000 times driving. Percentages tend to have an exponential scale to them (or more accurately a logistic curve). You don’t have to be a particularly good driver to avoid getting into an accident 99.9% of the time you get behind the wheel, because that is just a few orders of magnitude improvement relative to 50%.
Information fidelity differs from information retention. Discarding 25% or 95% or more of collected information is reasonable; corrupting information at that rate is what I’m saying would be horrendous. (Because discarding information conserves resources; whereas corrupting information does not… except to the extent that you would consider compressing information with a lossy (as in “not lossless”) compression to be a corrupting information, but I would still consider that to be discarding information. Episodic memory is either very compressed or very corrupted depending on what you think it should be.)
In my experience, people are actually more likely to be underconfident about factual information than they are to be overconfident, if you measure confidence on an absolute scale instead of a relative-to-other-people scale. My family goes to trivia night, and we almost always get at least as many correct as we expect to get correct, usually more. However, other teams typically score better than we expect them to score too, and we win the round less often than we expect to.
Think back to grade school when you actually had fill in the blank and multiple choice questions on tests. I’m going to guess that you probably were an A student and got around 95% right on your tests… because a) that’s about what I did and I tend to project, b) you’re on LessWrong so you were probably an A student, and C) you say you feel like you ought to be right about 95% of the time. I’m also going to guess (because I tend to project my experience onto other people) that you probably felt a lot less than 95% confident on average when you were taking the tests. There were more than a few tests I took in my time in school where I walked out of the test thinking “I didn’t know any of that; I’ll probably get a 70 or better just because that would be horribly bad compared to what I usually do, but I really feel like I failed that”… and it was never 70. (Math was the one exception in which I tended to be overconfident, I usually made more mistakes than I expected to make on my math tests.)
Where calibration is really screwed up is when you deal with subjects that are way outside of the domain of normal experience, especially if you know that you know more than your peer group about this domain. People are not good at thinking about abstract mathematics, artificial intelligence, physics, evolution, and other subjects that happen at a different scale from normal everyday life. When I was 17, I thought I understood Quantum Mechanics just because I’d read A Brief History of Time and A Universe in a Nut Shell… Boy was I wrong!
On LessWrong, we are usually discussing subjects that are way beyond the domain of normal human experience, so we tend to be overconfident in our understanding of these subjects… but part of the reason for this overconfidence is that we do tend to be correct about most of the things we encounter within the confines of routine experience.
This is a crucial observation if you are trying to use this technique to improve your calibration of your own accuracy! You can’t just start making bets when no one else you associate regularly is challenging you to the bets.
Several years ago, I started taking note of all of the times I disagreed with other people and looking it up, but initially, I only counted myself as having “disagreed with other people” if they said something I thought was wrong, and I attempted to correct them. Then I soon added in the case when they corrected me and I argued back. During this period of time, I went from thinking I was about 90% accurate in my claims to believing I was way more accurate than that. I would go months without being wrong, and this was in college, so I was frequently getting into disagreements with people, probably, an average, three a day during the school year. Then I started checking the times that other people corrected me, just as much as I checked when I corrected other people. (Counting even the times that I made no attempt to argue.) And my accuracy rate plummeted.
Another thing I would recommend to people starting out in doing this is that you should keep track of your record with individual people not just your general overall record. My accuracy rate with a few people is way lower than my overall accuracy rate. My overall rate is higher than it should be because I know a few argumentative people who are frequently wrong. (This would probably change if we were actually betting money, and we were only counting arguments when those people were willing to bet. So you’re approach adjusts for this better than mine.) I have several people for whom I’m close to 50%, and there are two people for whom I have several data points and my overall accuracy is below 50%.
There’s one other point I think somebody needs to make about calibration. And that’s that 75% accuracy when you disagree with other people is not the same thing as 75% accuracy. 75% information fidelity is atrocious; 95% information fidelity is not much better. Human brains are very defective in a lot of ways, but they aren’t that defective! Except at doing math. Brains are ridiculously bad at math relative to how easily machines can be implemented to be good at math. For most intents and purposes, 99% isn’t a very high percentage. I am not a particular good driver, but I haven’t gotten into a collision with another vehicle in my well over 1000 times driving. Percentages tend to have an exponential scale to them (or more accurately a logistic curve). You don’t have to be a particularly good driver to avoid getting into an accident 99.9% of the time you get behind the wheel, because that is just a few orders of magnitude improvement relative to 50%.
Information fidelity differs from information retention. Discarding 25% or 95% or more of collected information is reasonable; corrupting information at that rate is what I’m saying would be horrendous. (Because discarding information conserves resources; whereas corrupting information does not… except to the extent that you would consider compressing information with a lossy (as in “not lossless”) compression to be a corrupting information, but I would still consider that to be discarding information. Episodic memory is either very compressed or very corrupted depending on what you think it should be.)
In my experience, people are actually more likely to be underconfident about factual information than they are to be overconfident, if you measure confidence on an absolute scale instead of a relative-to-other-people scale. My family goes to trivia night, and we almost always get at least as many correct as we expect to get correct, usually more. However, other teams typically score better than we expect them to score too, and we win the round less often than we expect to.
Think back to grade school when you actually had fill in the blank and multiple choice questions on tests. I’m going to guess that you probably were an A student and got around 95% right on your tests… because a) that’s about what I did and I tend to project, b) you’re on LessWrong so you were probably an A student, and C) you say you feel like you ought to be right about 95% of the time. I’m also going to guess (because I tend to project my experience onto other people) that you probably felt a lot less than 95% confident on average when you were taking the tests. There were more than a few tests I took in my time in school where I walked out of the test thinking “I didn’t know any of that; I’ll probably get a 70 or better just because that would be horribly bad compared to what I usually do, but I really feel like I failed that”… and it was never 70. (Math was the one exception in which I tended to be overconfident, I usually made more mistakes than I expected to make on my math tests.)
Where calibration is really screwed up is when you deal with subjects that are way outside of the domain of normal experience, especially if you know that you know more than your peer group about this domain. People are not good at thinking about abstract mathematics, artificial intelligence, physics, evolution, and other subjects that happen at a different scale from normal everyday life. When I was 17, I thought I understood Quantum Mechanics just because I’d read A Brief History of Time and A Universe in a Nut Shell… Boy was I wrong!
On LessWrong, we are usually discussing subjects that are way beyond the domain of normal human experience, so we tend to be overconfident in our understanding of these subjects… but part of the reason for this overconfidence is that we do tend to be correct about most of the things we encounter within the confines of routine experience.