Yes, but it’s not evidence for more miscalibration, and I think “how miscalibrated?” is usually at least as important a question as “how sure are we of how miscalibrated?”.
Sure. So “how miscalibrated” is simply the proportional difference between values of the two curves. I.e. if you adjust the scales of graphs to make them the same size, it’s simply how far they appear to be visually.
adjust the scales of graphs to make them the same size
Note that if you have substantially different numbers of predictions at different confidence levels, you will need to do this adjustment within a single graph. That was the point of my remark about maybe using a logarithmic scale on the y-axis. But I still think that would be confusing.
Yes, but it’s not evidence for more miscalibration, and I think “how miscalibrated?” is usually at least as important a question as “how sure are we of how miscalibrated?”.
Sure. So “how miscalibrated” is simply the proportional difference between values of the two curves. I.e. if you adjust the scales of graphs to make them the same size, it’s simply how far they appear to be visually.
Note that if you have substantially different numbers of predictions at different confidence levels, you will need to do this adjustment within a single graph. That was the point of my remark about maybe using a logarithmic scale on the y-axis. But I still think that would be confusing.