I don’t think both machines are correct at the same rates. If they wager among themselves every day, with machine one taking 99-1 against a storm, and machine two taking whatever odds it calculates, they’ll OFTEN find bets that both think they can win, and machine two will get the vast majority of the money.
Machine one is WRONG most days. That’s not about confidence, that’s about specificity of prediction.
The original post wasn’t talking about “correctness”; it was talking about calibration, which is a very specific term with a very specific meaning. Machines one and two are both well-calibrated, but there is nothing requiring that two well-calibrated distributions must perform equally well against each other in a series of bets.
Indeed, this is the very point of the original post, so your comment attempting to contradict it did not, in fact, do so.
Being right isn’t enough. Confidence is very important.
It’s not talking about calibration—both are asserted to be equally well-calibrated. It’s talking about a difference it labels “confidence”, and I assert “correctness” or “usefulness” would be better words.
I don’t think both machines are correct at the same rates. If they wager among themselves every day, with machine one taking 99-1 against a storm, and machine two taking whatever odds it calculates, they’ll OFTEN find bets that both think they can win, and machine two will get the vast majority of the money.
Machine one is WRONG most days. That’s not about confidence, that’s about specificity of prediction.
The original post wasn’t talking about “correctness”; it was talking about calibration, which is a very specific term with a very specific meaning. Machines one and two are both well-calibrated, but there is nothing requiring that two well-calibrated distributions must perform equally well against each other in a series of bets.
Indeed, this is the very point of the original post, so your comment attempting to contradict it did not, in fact, do so.
The post is titled:
It’s not talking about calibration—both are asserted to be equally well-calibrated. It’s talking about a difference it labels “confidence”, and I assert “correctness” or “usefulness” would be better words.