52ceccf20f20130d0f8c2716521d24de

Karma: 7

52ceccf20f20130d0f8c2716521d24de Sep 21, 2023, 4:45 PM
3 points
0
on: Precision of Sets of Forecasts
methods based on rounding probabilities are hot flaming garbage
I think this depends a lot on what you’re interested in, i.e. what scoring rules you use. Someone who runs the same analysis with Brier instead of log-scores might disagree.
More generally, I’m not convinced it makes sense to think of “precision” as a constant, let alone a universal one, since it depends on
- the scoring rule in question: Imagine a set of forecasts that’s awfully calibrated on values <1% and >99%, but perfectly calibrated on values between 1% and 99%. With the log-score, this will probably get a bad precision value, while with Brier this would give a great one.
- someone’s calibration, as you point out with your final calibration plot.
I believe that these approaches are not good: For small datasets they produce large oscillations in the score, not smooth declines, and they improve the scores of worse-than-random forecast datasets.
I don’t think it’s very counterintuitive/undesirable for (what, in practice, is essentially) noise to make worse-than-random forecasts better. As a matter of fact, this also happens if you replace log-scores with Brier in your analysis with random noise instead of rounding.
Also, regarding oscillations: I don’t think properties of “precision” obtained from small datasets are too important, for similar reasons why I usually don’t pay a lot of attention to calibration plots obtained from a handful of forecasts.
As we increase the perturbation, the score falls ~monotonically (which I conjecture to always be true in the limit of infinitely many samples)
This conjecture is true and should easily generalise to more general 1-parameter families of centered, symmetric distributions admitting suitable couplings (e.g. additive N(0,\sigma^2) noise in log-odds space) using the fact that log(sigmoid(x+y))+log(sigmoid(x-y)) is decreasing in y for all log-odds x and all positive y (QED).
(NB: This fails when replacing log-scores with Brier.)
Rounding very strongly rounds everything to 50%, so with strong enough rounding every dataset has the same score.
I could make a similar argument for the noise-based version, if I chose to use Brier (or any other scoring rule S that depends only on |p-outcome| and converges to finite values as p tends towards 0 and 1): With sufficiently strong noise, every forecast becomes ≈0% and ≈100% with equal probability, so the expected score in the “large noise limit” converges to (S(0, outcome) + S(1, outcome))/2.

52ceccf20f20130d0f8c2716521d24de Jun 15, 2022, 9:21 AM
6 points
in reply to: SimonM’s comment on: Is Metaculus Slow to Update?
Just to confirm: Writing $p_{t}$ , the probability of $A$ at time $t$ , as $p_{t} = E [1_{A} ∣ F_{t}]$ (here $F_{t}$ is the sigma-algebra at time $t$ ), we see that $p_{t}$ must be a martingale via the tower rule.
The log-odds $x_{t} = log \frac{p_{t}}{1 - p_{t}}$ are not martingales unless $p_{t} \equiv const$ because Itô gives us
$\begin{matrix} d x_{t} & def = d log \frac{p_{t}}{1 - p_{t}} Itô = \frac{1}{p_{t} (1 - p_{t})} d p_{t}      martingale part + \frac{1}{2} (\frac{1}{(1 - p_{t})^{2}} - \frac{1}{p_{t}^{2}}) d [p]_{t}      drift part . \end{matrix}$
So unless $p_{t}$ is continuous and of bounded variation (⇒ $d [p]_{t} = 0$ , but this also implies that $p_{t} \equiv const$ ; the integrand of the drift part only vanishes if $p_{t} \equiv \frac{1}{2}$ for all $t$ ), the log-odds are not a martingale.
Interesting analysis on log-odds might still be possible (just use $d p_{t} = p_{t + 1} - p_{t}$ and $d [p]_{t} = (p_{t + 1} - p_{t})^{2}$ for discrete-time/jump processes as we naturally get when working with real data), but it’s not obvious to me if this comes with any advantages over just working with $p_{t}$ directly.

52ceccf20f20130d0f8c2716521d24de Jan 13, 2022, 3:33 PM
1 point
on: What is a probabilistic physical theory?
(Why) are you not happy with Velenik’s answer or “a probabilistic theory $(Ω, F, P)$ tells us that if we look at an event $A \in F$ and perform the same experiment $N \to \infty$ times, then the fraction of experiments where $A$ happened approaches $P (A)$ in a LLN-like manner”? Is there something special about physical phenomena as opposed to observables?
> $[0, 1]$ can be written as the union of a meager set and a set of null measure. This result forces us to make a choice as to which class of sets we will neglect, or otherwise we will end up neglecting the whole space $[0, 1]$ !
Either neither of these sets are measurable or this meagre set has measure 1. Either way, it seems obvious what to neglect.