The phrasing “being right isn’t enough” seems a bit off, because the second machine is right more often than the first. So maybe the post is more about calibration vs confidence. But there’s no tension between these two things, because they can be combined into log score—a single dimension that captures the best parts of both.
The phrasing “being right isn’t enough” seems a bit off, because the second machine is right more often than the first. So maybe the post is more about calibration vs confidence. But there’s no tension between these two things, because they can be combined into log score—a single dimension that captures the best parts of both.