I do discuss this exact point in the above lengthy comment, and I allow for this possibility. Here is the relevant part:
The first possible path towards accurate calibration is when the same person performs essentially the same judgment many times, and from the past performance we extract the frequency with which their brain tends to produce the right answer. If this level of accuracy remains roughly constant in time, then it makes sense to attach it as the probability to that person’s future judgments on the topic. This approach treats the relevant operations in the brain as a black box whose behavior, being roughly constant, can be subjected to such extrapolation.
Now clearly, the critical part is to ensure that the future judgments are based on the same parts of the person’s brain and that the relevant features of these parts, as well as the problem being solved, remain unchanged. In practice, these requirements can be satisfied by people who have reached the peak of ability achievable by learning from experience in solving some problem that repeatedly occurs in nearly identical form. Still, even in the best case, we’re talking about a very limited number of questions and people here.
I know you have limited it to repeated judgments about essentialy the same question. I was rather asking why, and I am still not sure whether I interpret it correctly. Is it that the judgments themselves are possibly produced by different parts of brain, or the person’s self-evaluation of certainty are produced by different parts of brain, or both? And if so, so what?
Imagine a test is done on a particular person. During five consecutive years he is being asked a lot of questions (of all different types), and he has to give an answer and a subjective feeling of certainty. After that, we see that the answers which he has labeled as “almost certain” were right in 83%, 78%, 81%, 84% and 85% of cases in the five years. Let’s even say that the experimenters were careful enough to divide the questions into different topics, and establish, that his “almost certain” anwers about medicine were right in 94% of the time in average and his “almost certain” answers about politics were right in 56% of the time in average. All other topics were near the overall average.
Do you 1) maintain that such stable results are very unlikely to happen, or that 2) even if most of people can be calibrated is such way, still it doesn’t justify using them for measuring probabilities?
I know you have limited it to repeated judgments about essentialy the same question. I was rather asking why, and I am still not sure whether I interpret it correctly. Is it that the judgments themselves are possibly produced by different parts of brain, or the person’s self-evaluation of certainty are produced by different parts of brain, or both? And if so, so what?
We don’t really know, but it could certainly be both, and also it may well be that the same parts of the brain are not equally reliable for all questions they are capable of processing. Therefore, while simple inductive reasoning tells us that consistent accuracy on the same problem can be extrapolated, there is no ground to generalize to other questions, since they may involve different parts of the brain, or the same part functioning in different modes that don’t have the same accuracy.
Unless, of course, we cover all such various parts and modes and obtain some sort of weighted average over them, which I suppose is the point of your thought experiment, of which more below.
Do you 1) maintain that such stable results are very unlikely to happen, or that 2) even if most of people can be calibrated is such way, still it doesn’t justify using them for measuring probabilities?
If the set of questions remains representative—in the sense of querying the same brain processes with the same frequency—the results could turn out to be fairly stable. This could conceivably be achieved by large and wide-ranging sets of questions. (I wonder if someone has actually done such experiments?)
However, the result could be replicated only if the same person is again asked similar large sets of questions that are representative with regards to the frequencies with which they query different brain processes. Relative to that reference class, it clearly makes sense to attach probabilities to answers, so, yes, here we would have another counterexample for my original claim, for another peculiar meaning of probabilities.
The trouble is that these probabilities would be useless for any purpose that doesn’t involve another similar representative set of questions. In particular, sets of questions about some particular topic that is not representative would presumably not replicate them, and thus they would be a very bad guide for betting that is limited to some particular topic (as it nearly always is). Thus, this seems like an interesting theoretical exercise, but not a way to obtain practically useful numbers.
(I should add that I never thought about this scenario before, so my reasoning here might be wrong.)
If there are any experimental psychologist reading this, maybe they can organise the experiment. I am curious whether people indeed can be calibrated on general questions.
I do discuss this exact point in the above lengthy comment, and I allow for this possibility. Here is the relevant part:
Now clearly, the critical part is to ensure that the future judgments are based on the same parts of the person’s brain and that the relevant features of these parts, as well as the problem being solved, remain unchanged. In practice, these requirements can be satisfied by people who have reached the peak of ability achievable by learning from experience in solving some problem that repeatedly occurs in nearly identical form. Still, even in the best case, we’re talking about a very limited number of questions and people here.
I know you have limited it to repeated judgments about essentialy the same question. I was rather asking why, and I am still not sure whether I interpret it correctly. Is it that the judgments themselves are possibly produced by different parts of brain, or the person’s self-evaluation of certainty are produced by different parts of brain, or both? And if so, so what?
Imagine a test is done on a particular person. During five consecutive years he is being asked a lot of questions (of all different types), and he has to give an answer and a subjective feeling of certainty. After that, we see that the answers which he has labeled as “almost certain” were right in 83%, 78%, 81%, 84% and 85% of cases in the five years. Let’s even say that the experimenters were careful enough to divide the questions into different topics, and establish, that his “almost certain” anwers about medicine were right in 94% of the time in average and his “almost certain” answers about politics were right in 56% of the time in average. All other topics were near the overall average.
Do you 1) maintain that such stable results are very unlikely to happen, or that 2) even if most of people can be calibrated is such way, still it doesn’t justify using them for measuring probabilities?
prase:
We don’t really know, but it could certainly be both, and also it may well be that the same parts of the brain are not equally reliable for all questions they are capable of processing. Therefore, while simple inductive reasoning tells us that consistent accuracy on the same problem can be extrapolated, there is no ground to generalize to other questions, since they may involve different parts of the brain, or the same part functioning in different modes that don’t have the same accuracy.
Unless, of course, we cover all such various parts and modes and obtain some sort of weighted average over them, which I suppose is the point of your thought experiment, of which more below.
If the set of questions remains representative—in the sense of querying the same brain processes with the same frequency—the results could turn out to be fairly stable. This could conceivably be achieved by large and wide-ranging sets of questions. (I wonder if someone has actually done such experiments?)
However, the result could be replicated only if the same person is again asked similar large sets of questions that are representative with regards to the frequencies with which they query different brain processes. Relative to that reference class, it clearly makes sense to attach probabilities to answers, so, yes, here we would have another counterexample for my original claim, for another peculiar meaning of probabilities.
The trouble is that these probabilities would be useless for any purpose that doesn’t involve another similar representative set of questions. In particular, sets of questions about some particular topic that is not representative would presumably not replicate them, and thus they would be a very bad guide for betting that is limited to some particular topic (as it nearly always is). Thus, this seems like an interesting theoretical exercise, but not a way to obtain practically useful numbers.
(I should add that I never thought about this scenario before, so my reasoning here might be wrong.)
If there are any experimental psychologist reading this, maybe they can organise the experiment. I am curious whether people indeed can be calibrated on general questions.