I’m curious how they handle model error (the error when your model is totally wrong).
They punish it. That is, your stated credence should include both your ‘inside view’ error of “How confident is my mythology module in this answer?” and your ‘outside view’ error of “How confident am I in my mythology module?”
One of the primary benefits of playing a Credence Game like this one is it gives you a sense of those outside view confidences. I am, for example, able to tell which of two American postmasters general came first at the 60% level, simply by using the heuristic of “which of these names sounds more old-timey?”, but am at the 50% level (i.e. pure chance) in determining which sports team won a game by comparing their names.
But it seems hard to guess beforehand that the question you thought you were answering wasn’t the question that you were being asked!
This is the sort of thing you learn by answering a bunch of questions from the same person, or by having a lawyer-sense of “how many qualifications would I need to add or remove to this sentence to be sure?”.
OK, so all that makes sense and seems basically correct, but I don’t see how you get from there to being able to map confidence for persons across a question the same way you can for questions across a person.
Adopting that terminology, I’m saying for a typical Less Wrong user, they likely have a similar understanding-the-question module. This module will be right most of the time and wrong some of the time, so they correctly apply the outside view error afterwards on each of their estimates. Since the understanding-the-question module is similar for each person, though, the actual errors aren’t evenly distributed across questions, so they will underestimate on “easy” questions and overestimate on “hard” ones, if easy and hard are determined afterwards by percentage that get the answer correct.
Since the understanding-the-question module is similar for each person, though, the actual errors aren’t evenly distributed across questions, so they will underestimate on “easy” questions and overestimate on “hard” ones, if easy and hard are determined afterwards by percentage that get the answer correct.
That seems reasonable to me, yes, as an easy way for a question to be ‘hard’ is if most answerers interpret it differently from the questioner.
They punish it. That is, your stated credence should include both your ‘inside view’ error of “How confident is my mythology module in this answer?” and your ‘outside view’ error of “How confident am I in my mythology module?”
One of the primary benefits of playing a Credence Game like this one is it gives you a sense of those outside view confidences. I am, for example, able to tell which of two American postmasters general came first at the 60% level, simply by using the heuristic of “which of these names sounds more old-timey?”, but am at the 50% level (i.e. pure chance) in determining which sports team won a game by comparing their names.
This is the sort of thing you learn by answering a bunch of questions from the same person, or by having a lawyer-sense of “how many qualifications would I need to add or remove to this sentence to be sure?”.
OK, so all that makes sense and seems basically correct, but I don’t see how you get from there to being able to map confidence for persons across a question the same way you can for questions across a person.
Adopting that terminology, I’m saying for a typical Less Wrong user, they likely have a similar understanding-the-question module. This module will be right most of the time and wrong some of the time, so they correctly apply the outside view error afterwards on each of their estimates. Since the understanding-the-question module is similar for each person, though, the actual errors aren’t evenly distributed across questions, so they will underestimate on “easy” questions and overestimate on “hard” ones, if easy and hard are determined afterwards by percentage that get the answer correct.
That seems reasonable to me, yes, as an easy way for a question to be ‘hard’ is if most answerers interpret it differently from the questioner.