I know you have limited it to repeated judgments about essentialy the same question. I was rather asking why, and I am still not sure whether I interpret it correctly. Is it that the judgments themselves are possibly produced by different parts of brain, or the person’s self-evaluation of certainty are produced by different parts of brain, or both? And if so, so what?
We don’t really know, but it could certainly be both, and also it may well be that the same parts of the brain are not equally reliable for all questions they are capable of processing. Therefore, while simple inductive reasoning tells us that consistent accuracy on the same problem can be extrapolated, there is no ground to generalize to other questions, since they may involve different parts of the brain, or the same part functioning in different modes that don’t have the same accuracy.
Unless, of course, we cover all such various parts and modes and obtain some sort of weighted average over them, which I suppose is the point of your thought experiment, of which more below.
Do you 1) maintain that such stable results are very unlikely to happen, or that 2) even if most of people can be calibrated is such way, still it doesn’t justify using them for measuring probabilities?
If the set of questions remains representative—in the sense of querying the same brain processes with the same frequency—the results could turn out to be fairly stable. This could conceivably be achieved by large and wide-ranging sets of questions. (I wonder if someone has actually done such experiments?)
However, the result could be replicated only if the same person is again asked similar large sets of questions that are representative with regards to the frequencies with which they query different brain processes. Relative to that reference class, it clearly makes sense to attach probabilities to answers, so, yes, here we would have another counterexample for my original claim, for another peculiar meaning of probabilities.
The trouble is that these probabilities would be useless for any purpose that doesn’t involve another similar representative set of questions. In particular, sets of questions about some particular topic that is not representative would presumably not replicate them, and thus they would be a very bad guide for betting that is limited to some particular topic (as it nearly always is). Thus, this seems like an interesting theoretical exercise, but not a way to obtain practically useful numbers.
(I should add that I never thought about this scenario before, so my reasoning here might be wrong.)
If there are any experimental psychologist reading this, maybe they can organise the experiment. I am curious whether people indeed can be calibrated on general questions.
prase:
We don’t really know, but it could certainly be both, and also it may well be that the same parts of the brain are not equally reliable for all questions they are capable of processing. Therefore, while simple inductive reasoning tells us that consistent accuracy on the same problem can be extrapolated, there is no ground to generalize to other questions, since they may involve different parts of the brain, or the same part functioning in different modes that don’t have the same accuracy.
Unless, of course, we cover all such various parts and modes and obtain some sort of weighted average over them, which I suppose is the point of your thought experiment, of which more below.
If the set of questions remains representative—in the sense of querying the same brain processes with the same frequency—the results could turn out to be fairly stable. This could conceivably be achieved by large and wide-ranging sets of questions. (I wonder if someone has actually done such experiments?)
However, the result could be replicated only if the same person is again asked similar large sets of questions that are representative with regards to the frequencies with which they query different brain processes. Relative to that reference class, it clearly makes sense to attach probabilities to answers, so, yes, here we would have another counterexample for my original claim, for another peculiar meaning of probabilities.
The trouble is that these probabilities would be useless for any purpose that doesn’t involve another similar representative set of questions. In particular, sets of questions about some particular topic that is not representative would presumably not replicate them, and thus they would be a very bad guide for betting that is limited to some particular topic (as it nearly always is). Thus, this seems like an interesting theoretical exercise, but not a way to obtain practically useful numbers.
(I should add that I never thought about this scenario before, so my reasoning here might be wrong.)
If there are any experimental psychologist reading this, maybe they can organise the experiment. I am curious whether people indeed can be calibrated on general questions.