I would like to see a calibration test with open-ended questions rather than multiple choice. Multiple choice makes it easier to judge confidence, but I’m afraid the calibrations won’t transfer well to other domains.
(The test-taker would have to grade their test, since open ended questions may have multiple answers, and typos and minor variations shouldn’t count as errors. But other than that, the test would be pretty much the same.)
An open-ended probability calibration test is something I’ve been planning to build. I’d be curious to hear your thoughts on how the specifics should be implemented. How should they grade their own test in a way that avoids bias and still gives useful results?
I would like to see a calibration test with open-ended questions rather than multiple choice. Multiple choice makes it easier to judge confidence, but I’m afraid the calibrations won’t transfer well to other domains.
(The test-taker would have to grade their test, since open ended questions may have multiple answers, and typos and minor variations shouldn’t count as errors. But other than that, the test would be pretty much the same.)
An open-ended probability calibration test is something I’ve been planning to build. I’d be curious to hear your thoughts on how the specifics should be implemented. How should they grade their own test in a way that avoids bias and still gives useful results?