ozziegooen comments on ozziegooen’s Shortform

ozziegooen 2 Dec 2019 0:29 UTC
13 points
I think one idea I’m excited about is the idea that predictions can be made of prediction accuracy. This seems pretty useful to me.

Example

Say there’s a forecaster Sophia who’s making a bunch of predictions for pay. She uses her predictions to make a meta-prediction of her total prediction-score on a log-loss scoring function (on all predictions except her meta-predictions). She says that she’s 90% sure that her total loss score will be between −5 and −12.

The problem is that you probably don’t think you can trust Sophia unless she has a lot of experience making similar forecasts.

This is somewhat solved if you have a forecaster that you trust that can make a prediction based on Sophia’s seeming ability and honesty. The naive thing would be for that forecaster to predict their own distribution of the log-loss of Sophia, but there’s perhaps a simpler solution. If Sophia’s provided loss distribution is correct, that would mean that she’s calibrated in this dimension (basically, this is very similar to general forecast calibration). The trusted forecaster could forecast the adjustment made to her term, instead of forecasting the same distribution. Generally this would be in the direction of adding expected loss, as Sophia probably had more of an incentive to be overconfident (which would result in a low expected score from her) than underconfident. This could perhaps make sense as a percentage modifier (-30% points), a mean modifier (-3 to −8 points), or something else.

External clients would probably learn not to trust Sophia’s provided expected error directly, but instead the “adjusted” forecast.

This can be quite useful. Now, if Sophia wants to try to “cheat the system” and claim that she’s found new data that decreases her estimated error, the trusted forecaster will pay attention and modify their adjustment accordingly. Sophia will then need to provide solid evidence that she really believes her work and is really calibrated for the trusted forecaster to budge.

I want to call this something like forecast appraisal, attestation, or pinning. Please leave comments if you have ideas.

“Trusted Forecaster” Error

You may be wondering how we ensure that the “trusted” forecaster is actually good. For one thing, they would hopefully go through the same procedure. I would imagine there could be a network of “trusted” forecasters that are all estimating each other’s predicted “calibration adjustment factors”. This wouldn’t work if observers didn’t trust any of these or thought they were colluding, but could if they had one single predictor they trusted. Also, note that over time data would come in and some of this would be verified.

The idea of focusing a lot on “expected loss” seems quite interesting to me. One thing it could encourage is contracts or Service Level Agreements. For instance, I could propose a ⁵⁰⁄₅₀ bet for anyone, for a percentile of my expected loss distribution. Like, “I’d be willing to bet $1,000 with anyone that the eventual total error of my forecasts will be less than the 65th percentile of my specified predicted error.” Or, perhaps a “prediction provider” would have to pay back an amount of their fee, or even more, if the results are a high percentile of their given predicted errors. This could generally be a good way to verify a set of forecasts. Another example would be to have a prediction group make 1000 forecasts, then heavily subsidize one question on a popular prediction market that’s predicting their total error.

Markets For Purchasing Prediction Bundles

Of course, the trusted forecasters can not only forecast the “calibration adjustment factors” for ongoing forecasts, but they can also forecast these factors for hypothetical forecasts as well.

Say you have 500 questions that need to be predicted, and there are multiple agencies that all say they could do a great job predicting these questions. They all give estimates of their mean predicted error, conditional on them doing the prediction work. Then you have a trusted forecaster give a calibration adjustment.

Firm’s Predicted Error Calibration Adjustment Adjusted Predicted Error

Firm 1 −20 −2 −22

Firm 2 −12 −9 −21

Firm 3 −15 −3 −18

(Note: the lower the expected error, the worse)

In this case, Firm 2 makes the best claim, but is revealed to be significantly overconfident. Firm 3 has the best adjusted predicted error, so they’re the ones to go with. In fact, you may want to penalize Firm 2 further for being a so-called prediction service with apparent poor calibration skills.

Correlations

One quick gotcha; one can’t simply sum the expected errors of all of one’s predictions to get the total predicted error. This would treat them as independent, and there are likely to be many correlations between them. For example, if things go “seriously wrong”; it’s likely many different predictions will have high losses. To handle this perfectly would really require one model to have produced all forecasts, but if that’s not the case there could likely be simple ways to approximate this.

Bundles vs. Prediction Markets

I’d expect that in many cases, private services will be more cost-effective than posting predictions on full prediction markets. Plus, private services could be more private and custom. The general selection strategy in the table above could of course include some options that involve hosting questions on prediction markets, and the victor would be chosen based on reasonable estimates.
- Bird Concept 4 Dec 2019 10:01 UTC
  2 points
  Parent
  “I’d be willing to bet $1,000 with anyone that the eventual total error of my forecasts will be less than the 65th percentile of my specified predicted error.”
  I think this is equivalent to applying a non-linear transformation to your proper scoring rule. When things settle, you get paid S(p) both based on the outcome of your object-level prediction p, and your meta prediction q(S(p)).
  Hence:
  S(p)+B(q(S(p)))
  where B is the “betting scoring function”.
  This means getting the scoring rules to work while preserving properness will be tricky (though not necessarily impossible).
  One mechanism that might help is that if each player makes one object prediction p and one meta prediction q, but for resolution you randomly sample one and only one of the two to actually pay out.
  - ozziegooen 10 Dec 2019 20:47 UTC
    3 points
    Parent
    Interesting, thanks! Yea, agreed it’s not proper. Coming up with interesting payment / betting structures for “package-of-forecast” combinations seems pretty great to me.
    - Bird Concept 11 Dec 2019 8:22 UTC
      8 points
      Parent
      I think this paper might be relevant: https://users.cs.duke.edu/~conitzer/predictionWINE09.pdf
      Abstract. A potential downside of prediction markets is that they may incentivize agents to take undesirable actions in the real world. For example, a prediction market for whether a terrorist attack will happen may incentivize terrorism, and an in-house prediction market for whether a product will be successfully released may incentivize sabotage. In this paper, we study principal-aligned prediction mechanisms– mechanisms that do not incentivize undesirable actions. We characterize all principal-aligned proper scoring rules, and we show an “overpayment” result, which roughly states that with n agents, any prediction mechanism that is principal-aligned will, in the worst case, require the principal to pay Θ(n) times as much as a mechanism that is not. We extend our model to allow uncertainties about the principal’s utility and restrictions on agents’ actions, showing a richer characterization and a similar “overpayment” result.
- NunoSempere 8 Jan 2020 12:41 UTC
  1 point
  Parent
  This is somewhat solved if you have a forecaster that you trust that can make a prediction based on Sophia’s seeming ability and honesty. The naive thing would be for that forecaster to predict their own distribution of the log-loss of Sophia, but there’s perhaps a simpler solution. If Sophia’s provided loss distribution is correct, that would mean that she’s calibrated in this dimension (basically, this is very similar to general forecast calibration). The trusted forecaster could forecast the adjustment made to her term, instead of forecasting the same distribution. Generally this would be in the direction of adding expected loss, as Sophia probably had more of an incentive to be overconfident ( which would result in a low expected score from her) than underconfident. This could perhaps make sense as a percentage modifier (-30% points), a mean modifier (-3 to −8 points), or something else. Is it actually true that forecasters would find it easier to forecast the adjustment?> This is somewhat solved if you have a forecaster that you trust that can make a prediction based on Sophia’s seeming ability and honesty. The naive thing would be for that forecaster to predict their own distribution of the log-loss of Sophia, but there’s perhaps a simpler solution. If Sophia’s provided loss distribution is correct, that would mean that she’s calibrated in this dimension (basically, this is very similar to general forecast calibration). The trusted forecaster could forecast the adjustment made to her term, instead of forecasting the same distribution. Generally this would be in the direction of adding expected loss, as Sophia probably had more of an incentive to be overconfident ( which would result in a low expected score from her) than underconfident. This could perhaps make sense as a percentage modifier (-30% points), a mean modifier (-3 to −8 points), or something else.
  Is it actually true that forecasters would find it easier to forecast the adjustment?
  - ozziegooen 8 Jan 2020 12:45 UTC
    2 points
    Parent
    One nice thing about adjustments is that they can be applied to many forecasts. Like, I can estimate the adjustment for someone’s [list of 500 forecasts] without having to look at each one.
    
    Over time, I assume that there would be heuristics for adjustments, like, “Oh, people of this reference class typically get a +20% adjustment”, similar to margins of error in engineering.
    
    That said, these are my assumptions, I’m not sure what forecasters will find to be the best in practice.

	Firm’s Predicted Error	Calibration Adjustment	Adjusted Predicted Error
Firm 1	−20	−2	−22
Firm 2	−12	−9	−21
Firm 3	−15	−3	−18

ozziegooen comments on ozziegooen’s Shortform

Example

“Trusted Forecaster” Error

Markets For Purchasing Prediction Bundles

Correlations

Bundles vs. Prediction Markets