JonahS comments on CFAR’s new focus, and AI Safety

JonahS 3 Dec 2016 3:46 UTC
33 points
A few nitpicks on choice of “Brier-boosting” as a description of CFAR’s approach:

Predictive power is maximized when Brier score is minimized

Brier score is the sum of differences between probabilities assigned to events and indicator variables that are are 1 or 0 according to whether the event did or did not occur. Good calibration therefore corresponds to minimizing Brier score rather than maximizing it, and “Brier-boosting” suggests maximization.

What’s referred to as “quadratic score” is essentially the same as the negative of Brier score, and so maximizing quadratic score corresponds to maximizing predictive power.

Brier score fails to capture our intuitions about assignment of small probabilities

A more substantive point is that even though the Brier score is minimized by being well-calibrated, the way in which it varies with the probability assigned to an event does not correspond to our intuitions about how good a probabilistic prediction is. For example, suppose four observers A, B, C and D assigned probabilities 0.5, 0.4, 0.01 and 0.000001 (respectively) to an event E occurring and the event turns out to occur. Intuitively, B’s prediction is only slightly worse than A’s prediction, whereas D’s prediction is much worse than C’s prediction. But the difference between the increase in B’s Brier score and A’s Brier score is 0.36 − 0.25 = 0.11, which is much larger than corresponding difference for D and C, which is approximately 0.02.

Brier score is not constant across mathematically equivalent formulations of the same prediction

Suppose that a basketball player is to make three free throws, observer A predicts that the player makes each one with probability p and suppose that observer B accepts observer A’s estimate and notes that this implies that the probability that the player makes all three free throws is p^3, and so makes that prediction.

Then if the player makes all three free throws, observer A’s Brier score increases by

3*(1 - p)^2

while observer B’s Brier score increases by

(1 - p^3)^2

But these two expressions are not equal in general, e.g. for p = 0.9 the first is 0.03 and the second is 0.073441. So changes to Brier score depend on the formulation of a prediction as opposed to the prediction itself.

======

The logarithmic scoring rule handles small probabilities well, and is invariant under changing the representation of a prediction, and so is preferred. I first learned of this from Eliezer’s essay A Technical Explanation of a Technical Explanation.

Minimizing logarithmic score is equivalent to maximizing the likelihood function for logistic regression / binary classification. Unfortunately, the phrase “likelihood boosting” has one more syllable than “Brier boosting” and doesn’t have same alliterative ring to it, so I don’t have an actionable alternative suggestion :P.
- sarahconstantin 3 Dec 2016 19:46 UTC
  9 points
  Parent
  Good point!
  
  (And thanks for explaining clearly and noting where you learned about logarithmic scoring.)
  
  I would suggest that “helping people think more clearly so that they’ll find truth better, instead of telling them what to believe” already has a name, and it’s “the Socratic method.” It’s unfortunate that this has the connotation of “do everything in a Q&A format”, though.
- Academian 3 Dec 2016 4:37 UTC
  7 points
  Parent
  “Brier scoring” is not a very natural scoring rule (log scoring is better; Jonah and Eliezer already covered the main reasons, and it’s what I used when designing the Credence Game for similar reasons). It also sets off a negative reaction in me when I see someone naming their world-changing strategy after it. It makes me think the people naming their strategy don’t have enough mathematician friends to advise them otherwise… which, as evidenced by these comments, is not the case for CFAR ;) Possible re-naming options that contrast well with “signal boosting”
  - Score boosting
  - Signal filtering
  - Signal vetting
  - AnnaSalamon 3 Dec 2016 4:56 UTC
    0 points
    Parent
    Got any that contrast with “raising awareness” or “outreach”?
    - lukeprog 7 Dec 2016 18:06 UTC
      0 points
      Parent
      “Accuracy-boosting” or “raising accuracy”?
    - Mqrius 5 Dec 2016 5:46 UTC
      0 points
      Parent
      Brainstormy words in that corner of concept-space:
      
      Raising the sanity waterline
      Downstream effects
      Giving someone a footstool so that they can see for themselves, instead of you telling them what’s on the other side of the wall
      ~~Critical mas~~s ~~hivemin~~d Compounding thinktank intelligence
      Doing thinks better
      
      [switches framing]
      Signal boosting means sending more signal so that you it arrives better on the other side. There’s more ways of doing so though;
      
      Noise reduction
      (The entire big field of) error correction methods
      Specifying the signal’s constraints clearly so that the other side can run a fit to it
      Stop sending the signal and instead build the generator on the other side
- Paul Crowley 8 Dec 2016 20:35 UTC
  4 points
  Parent
  I don’t think the first problem is a big deal. No-one worries about “I boosted that from a Priority 3 to a Priority 1 bug”.