I find this question really interesting. I think the core of the issue is the first part:
First, how can we settle who has been a better forecaster so far?
I think a good approach would be betting related. I believe different reasonable betting schemes are possible, which in some cases will give conflicting answers when ranking forecasters. Here’s one reasonable setup:
Let A = probability the first forecaster, Alice, predicts for some event.
Let B = probability the second forecaster, Bob, assigns (suppose B > A wlog).
Define what’s called an option: basically a promissory note to pay 1 point if the event happens, and nothing otherwise.
Alice will write and sell N such options to Bob for price P each, with N and P to be determined.
Alice’s EV is positive if P > A (she expects pay out A points/option on average).
Bob’s EV is positive if P < B (he expects to be paid B points/option on average).
A specific scheme can then stipulate the way to determine N and P. After that comparing forecasters, after a number of events, would just translate to comparing points.
As a simple illustration (without claiming it’s great), here’s one possible scheme for P and N:
Alice and Bob split the difference and set P = 1⁄2 (A + B).
N = 1.
One drawback of that scheme is that it doesn’t punish too much a forecaster who erroneously assigns a probability of 0% or 100% to an event.
A different structure of the whole setup would involve not two forecasters betting against each other, but each forecaster betting against some “cosmic bookie”. I have some ideas how to make that work too.
And what does this numerical value actually mean, as landing on Mars is not a repetitive random event nor it is a quantity which we can try measuring like the radius of Saturn?
I don’t see how we could assign some canonical meaning to this numerical value. For every forecaster there can always be a better one in principle, who takes into account more information, does more precise calculations, and happens to have better priors (until we reach the level of Laplace’s demon, at which point probabilities might just degenerate into 0 or 1).
If that’s true then such a numerical value would seem to just be a subjective property specific to a given forecaster, it’s whatever that forecaster assigns to the event and uses to estimate how many points (or whatever other metrics she cares about) she will have in the future.
I like the idea of defining a betting game ‘forecasters vs cosmic bookie’. Then saying ‘the probability that people will land on Mars by 2040 is 42%’ translates into semantics ‘I am willing to buy an option for Y<42 cents that would be worth $1 if we land on Mars by 2040 or $0 otherwise’.
To compare several forecasters we can consider a game in which each player is offered to buy some options of this kind. Suppose that for each x in {1, \dots, 99} each player is allowed to buy one option for x cents. If one believes that the probability of an event is 30% then it is profitable for them to buy the 29 cheapest options and nothing more (it does not matter if one buys the option for 30 cents or not).
To make the calculations simpler, we can make the prices continuous. So one is allowed to buy an option-interval [0,x] for some real x in [0,1]: by integration its price should be x22 and the pay-off is x if the event occurs. If the ‘true’ probability of the event is y then the expected profit equals yx−x22. One can easily see that if you know the value of y then the optimal strategy sets x=y. The larger mistake you make, the lower is your expected profit. The value of the game is the sum of all the profits and being a good forecaster means that one can design a strategy with high expected revenue.
An important drawback of this approach is that when you correctly estimate the probability of successful Mars landing to be 42%, then the optimal strategy gives expected profit 0.4222. However, if the question would be ‘what is the probability that people would FAIL to land on Mars by 2040?’, then the same knowledge gives you answer 58% and the expected profit is different: 0.5822. Hence, the bookie should also sell options that pays when the event does not occur or, equivalently, always consider each question together with its dual, i.e., the question about the event not happening. Now it begins to look like a proper mathematical formalization of forecasting.
Still, the problem remains that the choice of available options is arbitrary. Here I assumed that the prices are distributed uniformly in the interval [0,1] but one can consider some other distribution. The choice of the distribution governs how much you lose when you are off by 1% or 2%. The loss value is also different when you mistake 50% vs 51% and, e.g., 70% vs 71%. Tweaking the parameters of the distributions can change the result of any forecasting competition, but this should be fine as long as the parameters are known to the contestants.
That’s perfect, I was thinking along the same lines, with a range of options available for sale, but didn’t do the math and so didn’t realize the necessity of dual options. And you are right of course, there’s still quite a bit of arbitrariness left. In addition to varying the distribution of options there is, for example, freedom to choose what metric the forecasters are supposed to optimize. It doesn’t have to be EV, in fact in real life it rarely should be EV, because that ignores risk aversion. Instead we could optimize some utility function that becomes flatter for larger gains, for example we could use Kelly betting.
I find this question really interesting. I think the core of the issue is the first part:
I think a good approach would be betting related. I believe different reasonable betting schemes are possible, which in some cases will give conflicting answers when ranking forecasters. Here’s one reasonable setup:
Let A = probability the first forecaster, Alice, predicts for some event.
Let B = probability the second forecaster, Bob, assigns (suppose B > A wlog).
Define what’s called an option: basically a promissory note to pay 1 point if the event happens, and nothing otherwise.
Alice will write and sell N such options to Bob for price P each, with N and P to be determined.
Alice’s EV is positive if P > A (she expects pay out A points/option on average).
Bob’s EV is positive if P < B (he expects to be paid B points/option on average).
A specific scheme can then stipulate the way to determine N and P. After that comparing forecasters, after a number of events, would just translate to comparing points.
As a simple illustration (without claiming it’s great), here’s one possible scheme for P and N:
Alice and Bob split the difference and set P = 1⁄2 (A + B).
N = 1.
One drawback of that scheme is that it doesn’t punish too much a forecaster who erroneously assigns a probability of 0% or 100% to an event.
A different structure of the whole setup would involve not two forecasters betting against each other, but each forecaster betting against some “cosmic bookie”. I have some ideas how to make that work too.
I don’t see how we could assign some canonical meaning to this numerical value. For every forecaster there can always be a better one in principle, who takes into account more information, does more precise calculations, and happens to have better priors (until we reach the level of Laplace’s demon, at which point probabilities might just degenerate into 0 or 1).
If that’s true then such a numerical value would seem to just be a subjective property specific to a given forecaster, it’s whatever that forecaster assigns to the event and uses to estimate how many points (or whatever other metrics she cares about) she will have in the future.
I like the idea of defining a betting game ‘forecasters vs cosmic bookie’. Then saying ‘the probability that people will land on Mars by 2040 is 42%’ translates into semantics ‘I am willing to buy an option for Y<42 cents that would be worth $1 if we land on Mars by 2040 or $0 otherwise’.
To compare several forecasters we can consider a game in which each player is offered to buy some options of this kind. Suppose that for each x in {1, \dots, 99} each player is allowed to buy one option for x cents. If one believes that the probability of an event is 30% then it is profitable for them to buy the 29 cheapest options and nothing more (it does not matter if one buys the option for 30 cents or not).
To make the calculations simpler, we can make the prices continuous. So one is allowed to buy an option-interval [0,x] for some real x in [0,1]: by integration its price should be x22 and the pay-off is x if the event occurs. If the ‘true’ probability of the event is y then the expected profit equals yx−x22. One can easily see that if you know the value of y then the optimal strategy sets x=y. The larger mistake you make, the lower is your expected profit. The value of the game is the sum of all the profits and being a good forecaster means that one can design a strategy with high expected revenue.
An important drawback of this approach is that when you correctly estimate the probability of successful Mars landing to be 42%, then the optimal strategy gives expected profit 0.4222. However, if the question would be ‘what is the probability that people would FAIL to land on Mars by 2040?’, then the same knowledge gives you answer 58% and the expected profit is different: 0.5822. Hence, the bookie should also sell options that pays when the event does not occur or, equivalently, always consider each question together with its dual, i.e., the question about the event not happening. Now it begins to look like a proper mathematical formalization of forecasting.
Still, the problem remains that the choice of available options is arbitrary. Here I assumed that the prices are distributed uniformly in the interval [0,1] but one can consider some other distribution. The choice of the distribution governs how much you lose when you are off by 1% or 2%. The loss value is also different when you mistake 50% vs 51% and, e.g., 70% vs 71%. Tweaking the parameters of the distributions can change the result of any forecasting competition, but this should be fine as long as the parameters are known to the contestants.
That’s perfect, I was thinking along the same lines, with a range of options available for sale, but didn’t do the math and so didn’t realize the necessity of dual options. And you are right of course, there’s still quite a bit of arbitrariness left. In addition to varying the distribution of options there is, for example, freedom to choose what metric the forecasters are supposed to optimize. It doesn’t have to be EV, in fact in real life it rarely should be EV, because that ignores risk aversion. Instead we could optimize some utility function that becomes flatter for larger gains, for example we could use Kelly betting.