An open-ended probability calibration test is something I’ve been planning to build. I’d be curious to hear your thoughts on how the specifics should be implemented. How should they grade their own test in a way that avoids bias and still gives useful results?
An open-ended probability calibration test is something I’ve been planning to build. I’d be curious to hear your thoughts on how the specifics should be implemented. How should they grade their own test in a way that avoids bias and still gives useful results?