The question was about an event related to a well-known figure in world history. So even if you literally have no idea, your best guess for that reference class is “sometime between the year 0 and 2000”. The middle of this range is 1000. The probability that this should come within 15 years of the correct answer by sheer luck is about 1 in 100.
However, it just isn’t true that you didn’t have a clue. Given the name of the person and even a very rough idea who they were I’m pretty sure any LW reader could do considerably better than that; at the least narrow it down to a couple or maybe three centuries, for a 1 in 20 chance.
Yup; the only Principia Mathematica I’d ever heard of was the one by Russel and Whitehead. I leveraged this shocking lack of knowledge into a guess that Newton lived after Galileo and before Gauss, and put down 10% on 1750; which by the rule of thumb HonoreDB came up with puts me right on the edge of overconfidence.
Yeah. I got all panicky when I encountered the question (“Argh! Newton! How can I have nothing memorized about someone as important as Newton!”). By somewhat similar reasoning I got an answer and assigned about 1⁄3 probability to my being within 15 years. I ended up within 10 years of the correct answer. By HonoreDB’s rule that would be neither over- nor underconfident. But on discovering the answer I couldn’t help thinking, “rats—I should have been more confident”. I get a sense that thinking about scoring rules too much as a game can also lead to some biases.
For one thing, you have no problem with the survey’s designer “presuming” that the other questions in the survey are valuable; why do you reverse that judgement only in the case of the question that troubles you?
For another, your rejection was based on the lack of a “not a clue” option, and you haven’t refuted my point that this option would be punting.
It’s possible that the reason I’m bothered by your dismissal is that I ended up spending more time on this one question than the rest of the survey altogether.
You would come across as more sincere if you just said “I couldn’t be bothered to answer that question”.
An ordinary connotation of “How many X are there?” is that there aren’t any well-known reasons for there to be no X at all. If I ask you how many apples there are and you later find out that it’s actually a maple tree outside, then you would likely consider me not to be communicating in good faith — to be asking the question to make a point rather than to actually obtain information about apples.
I get your point. To add further weight to it, the snippet above is from an informal, likely fast paced IM conversation. Which makes considered analysis pedantic and socially uncalibrated.
That said, I found the 10 to 1000 estimate surprising.
The person asking the question hasn’t seen the tree.
He is merely picking one out of the woods.
Say a tree has 100 apples. Come late autumn any apples will fall, to ten, to one, and then to none.
The fact that ordinary apple trees ordinarily have no apples at least raises the possibility.
0 to 1000 apples would certainly be correct. Which is what we want.
How many dollars are in my wallet? (I haven’t looked.)
[09:02] X: i’ve never seen it and neither have you
[09:02] Eliezer: 10 to 1000
I don’t know what it is, exactly, about that exchange… but I have a rather souring reaction to it. I mean, it seems somewhat… I don’t know the term, contrary comes closest but definitely isn’t correct; as though when someone insists on using numerical responses to a statement “in order to prevent confusion” and then admits that doing so is as likely to induce confusion—well, it seems like such a thing is rather non-fruitful.
Is it really so hard to say “There is insufficient data for a meaningful reply”?
It’s a calibration
test, which is
(more or less) a test of how well you judge the accuracy of your guesses. How
accurate your guess itself was is not important here except when judged in the
light of the confidence level you assigned to the 15-years-either-way
confidence interval.
Except for people who happened to have the year of [redacted event]
memorized, everyone’s answer to the first of these two questions was a
guess. Some people’s guesses were more
educated than others, but the
important part is not the accuracy of the guess, but how well the accuracy of
the guess tracks the confidence level.
Since there was no option for “not a clue”, I left these fields blank. I do not believe they add anything.
That’s punting.
The question was about an event related to a well-known figure in world history. So even if you literally have no idea, your best guess for that reference class is “sometime between the year 0 and 2000”. The middle of this range is 1000. The probability that this should come within 15 years of the correct answer by sheer luck is about 1 in 100.
However, it just isn’t true that you didn’t have a clue. Given the name of the person and even a very rough idea who they were I’m pretty sure any LW reader could do considerably better than that; at the least narrow it down to a couple or maybe three centuries, for a 1 in 20 chance.
Yup; the only Principia Mathematica I’d ever heard of was the one by Russel and Whitehead. I leveraged this shocking lack of knowledge into a guess that Newton lived after Galileo and before Gauss, and put down 10% on 1750; which by the rule of thumb HonoreDB came up with puts me right on the edge of overconfidence.
Yeah. I got all panicky when I encountered the question (“Argh! Newton! How can I have nothing memorized about someone as important as Newton!”). By somewhat similar reasoning I got an answer and assigned about 1⁄3 probability to my being within 15 years. I ended up within 10 years of the correct answer. By HonoreDB’s rule that would be neither over- nor underconfident. But on discovering the answer I couldn’t help thinking, “rats—I should have been more confident”. I get a sense that thinking about scoring rules too much as a game can also lead to some biases.
I said that “I do not believe they add anything”, so no point engaging in the games where someone presumes that they do.
That sounds like a bad faith answer to me.
For one thing, you have no problem with the survey’s designer “presuming” that the other questions in the survey are valuable; why do you reverse that judgement only in the case of the question that troubles you?
For another, your rejection was based on the lack of a “not a clue” option, and you haven’t refuted my point that this option would be punting.
It’s possible that the reason I’m bothered by your dismissal is that I ended up spending more time on this one question than the rest of the survey altogether.
You would come across as more sincere if you just said “I couldn’t be bothered to answer that question”.
“‘I Don’t Know’” is a relevant Sequence post:
(Here’s the “Rerunning the Sequences” page.)
The next line should be:
Apple trees have zero apples most of the time.
Non-apple trees have no apples all of the time.
The quoted estimate sounds poorly calibrated and likely wrong.
An ordinary connotation of “How many X are there?” is that there aren’t any well-known reasons for there to be no X at all. If I ask you how many apples there are and you later find out that it’s actually a maple tree outside, then you would likely consider me not to be communicating in good faith — to be asking the question to make a point rather than to actually obtain information about apples.
I get your point. To add further weight to it, the snippet above is from an informal, likely fast paced IM conversation. Which makes considered analysis pedantic and socially uncalibrated.
That said, I found the 10 to 1000 estimate surprising.
The person asking the question hasn’t seen the tree. He is merely picking one out of the woods.
Say a tree has 100 apples. Come late autumn any apples will fall, to ten, to one, and then to none.
The fact that ordinary apple trees ordinarily have no apples at least raises the possibility.
0 to 1000 apples would certainly be correct. Which is what we want.
How many dollars are in my wallet? (I haven’t looked.)
I don’t know what it is, exactly, about that exchange… but I have a rather souring reaction to it. I mean, it seems somewhat… I don’t know the term, contrary comes closest but definitely isn’t correct; as though when someone insists on using numerical responses to a statement “in order to prevent confusion” and then admits that doing so is as likely to induce confusion—well, it seems like such a thing is rather non-fruitful.
Is it really so hard to say “There is insufficient data for a meaningful reply”?
How does guessing the answers add to the rest of the survey?
It’s a calibration test, which is (more or less) a test of how well you judge the accuracy of your guesses. How accurate your guess itself was is not important here except when judged in the light of the confidence level you assigned to the 15-years-either-way confidence interval.
Except for people who happened to have the year of [redacted event] memorized, everyone’s answer to the first of these two questions was a guess. Some people’s guesses were more educated than others, but the important part is not the accuracy of the guess, but how well the accuracy of the guess tracks the confidence level.