I think this could be a good example for what I’m getting at. I think there are definitely some people in some situations who can distinguish a p=0.00004 event from a p=0.00008 event. How? By making a Fermi model or similar.
A trivial example would be a lottery with calculable odds of success. Just because the odds are low doesn’t mean they can’t be precisely estimated.
I expect that the kinds of problems that GPOpen would consider asking AND are incredibly unlikely, would be difficult to estimate within 1 order of magnitude. But may still be able to do a decent job, especially in cases where you can make neat Fermi models.
However, of course, it seems very silly to use the incentive mechanism “you’ll get paid once we know for sure if the event happened” on such an event. Instead, if resolutions are done with evaluators, then there is much more of a signal.
I’m fairly skeptical of this. From a conceptual perspective, we expect the tails to be dominated by unknown unknowns and black swans. Fermi estimates and other modelling tools are much better at estimating scenarios that we expect. Whereas, if we find ourselves in the extreme tails, its often because of events or factors that we failed to model.
From a conceptual perspective, we expect the tails to be dominated by unknown unknowns and black swans.
I’m not sure. The reasons things happen at the tails typically fall into categories that could be organized to be a small set.
For instance:
The question wasn’t understood correctly.
A significant exogenous event happened.
But, as we do a bunch of estimates, we could get empirical data about these possibilities, and estimate the potentials for future tails.
This is a bit different to what I was mentioning, which was more about known but small risks. For instance, the “amount of time I spend on my report next week” may be an outlier if I die. But the chance of serious accident or death can be estimated decently well enough. These are often repeated known knowns.
You might have people who can distinguish those, but I think it’s a mistake to speak of calibration in that sense as the word usually refers to people who actually trained to be calibrated via feedback.
I think this could be a good example for what I’m getting at. I think there are definitely some people in some situations who can distinguish a p=0.00004 event from a p=0.00008 event. How? By making a Fermi model or similar.
A trivial example would be a lottery with calculable odds of success. Just because the odds are low doesn’t mean they can’t be precisely estimated.
I expect that the kinds of problems that GPOpen would consider asking AND are incredibly unlikely, would be difficult to estimate within 1 order of magnitude. But may still be able to do a decent job, especially in cases where you can make neat Fermi models.
However, of course, it seems very silly to use the incentive mechanism “you’ll get paid once we know for sure if the event happened” on such an event. Instead, if resolutions are done with evaluators, then there is much more of a signal.
I’m fairly skeptical of this. From a conceptual perspective, we expect the tails to be dominated by unknown unknowns and black swans. Fermi estimates and other modelling tools are much better at estimating scenarios that we expect. Whereas, if we find ourselves in the extreme tails, its often because of events or factors that we failed to model.
I’m not sure. The reasons things happen at the tails typically fall into categories that could be organized to be a small set.
For instance:
The question wasn’t understood correctly.
A significant exogenous event happened.
But, as we do a bunch of estimates, we could get empirical data about these possibilities, and estimate the potentials for future tails.
This is a bit different to what I was mentioning, which was more about known but small risks. For instance, the “amount of time I spend on my report next week” may be an outlier if I die. But the chance of serious accident or death can be estimated decently well enough. These are often repeated known knowns.
You might have people who can distinguish those, but I think it’s a mistake to speak of calibration in that sense as the word usually refers to people who actually trained to be calibrated via feedback.