I somehow missed all notifications of your reply and just stumbled upon it by chance when sharing this post with someone.
I had something very similar with my calibration results, only it was for 65% estimates:
I think your hypotheses 1 and 2 match with my intuitions about why this pattern emerges on a test like this. Personally, I feel like a combination of 1 and 2 is responsible for my “blip” at 65%.
I’m also systematically under-confident here — that’s because I cut my prediction teeth getting black swanned during 2020, so I tend to leave considerable room for tail events (which aren’t captured in this test). I’m not upset about that, as I think it makes for better calibration “in the wild.”
I somehow missed all notifications of your reply and just stumbled upon it by chance when sharing this post with someone.
I had something very similar with my calibration results, only it was for 65% estimates:
I think your hypotheses 1 and 2 match with my intuitions about why this pattern emerges on a test like this. Personally, I feel like a combination of 1 and 2 is responsible for my “blip” at 65%.
I’m also systematically under-confident here — that’s because I cut my prediction teeth getting black swanned during 2020, so I tend to leave considerable room for tail events (which aren’t captured in this test). I’m not upset about that, as I think it makes for better calibration “in the wild.”