Summary of my Participation in the Good Judgment Project

Gunnar_Zarncke3 Jun 2015 21:51 UTC

13 points

Follow-Up to Good Judgment Project, Season Three.

During the last forecasting season I took part in the Good Judgment Project (GJP; see also the blog) and this is a short summary of my participation (actually triggered by hamnox comment).

The GJP estimates world events like

Ukraine conflict
Arctic ice cap melting
Ebola outbreak duration
Chinese sea conflict
ISIS attacks
Terrorist attacks
Oil price
Certain exchange rates
Election results
and many other political events

To participate in that study one has to register (can’t remember where exactly I stumbled over the link, possibly the one at the top). And one has to do an preparatory online course and one has to pass an online test. At least I had to complete it. Whether the result affected my assignment to any group I can’t say. The course explains the scoring and gives recommendations for making good forecasts (choose forecasts one has an edge in, estimate early, update often, do post-mortems). The test seems to test for calibration and accuracy by asking for known (mostly political) events and whether one is sure about them.

The current forecasting season started in November 2014 and has just ended. I invested significantly less then half an hour a week on 8 questions of about 100 (and thus less than I projected in an early questionaire). I did 2 to 15 updates for these questions and I earned a score in the middle range (mostly due to getting hit by an unexpected terrorist attack). As I just learned I was assigned to the study condition were I could neither see the total group estimate nor the estimates of the other group members—only their comments. I was somewhat disappointed by this as I had hoped to learn something from how the scores developed. Too bad I wasn’t in a prediction marked group. But I hope to get the study results later.

I will not take part in further rounds as I shy the effort for the types of forecasts which are mostly political. They are political because the sponsor (guess who) is interested mostly in political events—less in economical, environmental, scientific or other types. But I enjoyed forecasting artic ice cap melting and ebola—and netted a better than average score on that.

The scoring—at least in this group—is interesting and uses an averaged Brier Score—averaged over a) all forecast questiontion and b) within a question over all the days for which a forecast is provided. I intended to game that by betting on questions that a) I could forecast well and b) that had an expected reliable outcome. Sadly there were few of type a.

From this experience I learned that

such prediction organizations ask mostly for political events,
political events are hard to predict and
predicting political events requires a lot of background information.
I’m below average in predicting political event (at least compared to my group which I’d guess has more interest in politics than I) but
I’m above average on non-political topics.

Gunnar_Zarncke3 Jun 2015 21:51 UTC

13 points

7 comments2 min readLW link Archive

Davidmanheim 4 Jun 2015 2:33 UTC
7 points
I was also involved. I did not need to do a training, and my group was in a prediction market, so it was a very different group. The data should be available once the papers are published; that may take a year or two. (If you’re a grad student, at least, you might be able to ask for the data for writing your own papers- when I talked to one of the organizers, he seemed to think that it could be worked out.)

IARPA is the intelligence version of DARPA, so they were in fact interested in the types of questions that intel analysts would face. On the other hand, the places where prediction is useful is not at all limited to these areas, and GJP is interested in extending that in the future.)
satt 5 Jun 2015 2:44 UTC
4 points
Thanks for posting this. The GJP’s sparked only sporadic discussion here, maybe because it focuses so much on world politics as opposed to stereotypically LWesque STEM stuff, and that’s a bit of a shame. I’m a STEM nerd myself, but in a way that made the GJP more enticing because I thought participating in it might nudge me to learn a tiny bit about world politics (it did), and because I wanted to see whether I could beat the averages despite having minimal domain-specific knowledge (I could).

To participate in that study one has to register (can’t remember where exactly I stumbled over the link, possibly the one at the top). And one has to do an preparatory online course and one has to pass an online test. At least I had to complete it. Whether the result affected my assignment to any group I can’t say.

IIRC I filled out a pre-registration form that just asked for bare-bones demographic info like occupation and highest-level education qualification. After the GJP let me into the study, but before they assigned me to a group, I think I filled out a longer background survey about myself, and did the political knowledge/calibration test.

I did the short training session after getting the group assignment. Presumably the (sub)group assignments are randomized so the researchers can make causal inferences about which treatments generate better forecasts.

The current forecasting season started in November 2014 and has just ended.

It’s actually still running for my group. We have 31 questions still open which don’t close until the 8th or 9th.

I invested significantly less then half an hour a week on 8 questions of about 100 (and thus less than I projected in an early questionaire). I did 2 to 15 updates for these questions and I earned a score in the middle range (mostly due to getting hit by an unexpected terrorist attack).

I wound up putting in more time than I think I anticipated, probably more than half an hour a week most weeks, and so far I’ve made 335 predictions on 36 questions. Since GJP started displaying my rank in my group, my overall Brier score’s consistently been in the lowest 20%.

As I just learned I was assigned to the study condition were I could neither see the total group estimate nor the estimates of the other group members—only their comments. I was somewhat disappointed by this as I had hoped to learn something from how the scores developed. Too bad I wasn’t in a prediction marke[t] group.

Maybe we were in the same group. My group also had no prediction markets, but I could read “tips” written by other people who were apparently chatting to each other in a forum to which I didn’t have access. I also couldn’t/can’t see other users’ predictions in real time, although I could see the group’s median Brier score for each question after it was closed.

I intended to game that by betting on questions that a) I could forecast well and b) that had an expected reliable outcome. Sadly there were few of type a.

Ah, but if you didn’t make a prediction on a question, you still got a Brier score for it — GJP gave you the median score of the group members who did make a prediction. (Or that’s how it worked for me, anyway.) So the trick is to choose questions where you expect to do better than the median predictor, even if those questions look difficult. (Perhaps especially questions which look difficult to you, because other people might be overconfident about them.) The sample size is small, but on each of the 4 questions where my Brier score was high (≥ 0.5) I scored 0.07-0.15 fewer points than the group score, which really helped drive down my overall score.
- political events are hard to predict
Mostly true, although election-result questions tended to be nice & easy. A few other political events weren’t obvious slam-dunks if I looked at them from a distance, but became very obvious slam-dunks as soon as I investigated them.

Example: “Will a referendum on Quebec’s affiliation with Canada be held before 31 December 2014?”, which I didn’t touch until October. But when I ran Google News searches about it, the lack of positive evidence for expecting a referendum was stark, and I immediately gave it only a 5% probability. During October I monotonically lowered that as tips came in pointing out that the one party pushing for a referendum was unpopular and leader-less, and that a referendum would take time to organize. For all of November & December I had that question at 0%, and my final Brier score for it halved the (already tiny) group score.

I also discovered that the prediction difficulty of the political questions was often time-dependent. IARPA tried to pick relevant & topical questions, which meant that a lot of questions were provoked by news coverage. But because the news prefers dramatic, sudden events, quite a few of the resulting questions were about transient crises or other hot issues that rapidly cooled down and became highly predictable within days or weeks, leaving them easy to predict for most of the (months-long) prediction windows.

A good tactic therefore turned out to be: just wait. It’d be interesting to see how people would do in a GJP re-run where the questions had shorter prediction windows, and that tactic would surely be less successful.
What links here?
- satt's comment on Open thread, Aug. 17 - Aug. 23, 2015 by MrMind (21 Aug 2015 3:02 UTC; 0 points)
- Gunnar_Zarncke 5 Jun 2015 21:17 UTC
  2 points
  Parent
  
  It’s actually still running for my group.
  
  Yes. Actually it is. Somehow I misinterpreted one of the last mails. At least it’s closed on all my forecasts.
  
  Maybe we were in the same group.
  
  Maybe. The best forecaster is grossz18 in my group.
  - satt 6 Jun 2015 15:19 UTC
    1 point
    Parent
    
    Maybe. The best forecaster is grossz18 in my group.
    
    No. 1 in my group is morrell. Our groups are probably different after all...or GJP is feeding us different rankings as part of the experiment!
- Gunnar_Zarncke 5 Jun 2015 21:13 UTC
  2 points
  Parent
  Thank you for your detailed contribution!
  
  Ah, but if you didn’t make a prediction on a question, you still got a Brier score for it — GJP gave you the median score of the group members who did make a prediction.
  
  Hm, yes, that makes sense as these scores are listed in grey in my coulmn too. I just didn’t make that connection and can’t seem to remember that it was explicitly explained that way—but maybe I misunderstood which averaging applies to which. Esp. before actually seeing the UI.
  
  So the trick is to choose questions where you expect to do better than the median predictor, even if those questions look difficult...
  
  Yes. That seems like another sensible strategy to game it,
  
  I recommend that you send this as a reply to one of the last mails. They seem to really read them.
- ChristianKl 5 Jun 2015 13:22 UTC
  2 points
  Parent
  
  The GJP’s caught little interest here, maybe because it focuses so much on world politics as opposed to stereotypically LWesque STEM stuff, and that’s a bit of a shame.
  
  I don’t think “little interest” is a fair description. Searching LW for Good Judgment Project provides 290 search hits.
  - satt 6 Jun 2015 18:14 UTC
    2 points
    Parent
    I just did a search for “(”Good Judgment Project” OR GJP)” and got only 87 hits, so most of your results might merely have been recent comments/posts in LW’s sidebar.
    
    Looking through the first couple of pages of hits I see
    
    a link post for GJP season 3, and the only comments are the ones I linked in the grandparent (I upvoted them anyway because they’re interesting feedback)
    
    a link post about an earlier GJP round, which does actually have a lot of GJP talk among its 55 comments
    
    this Gunnar_Zarncke post
    
    part 1 of Morendil’s 2012 “Raising the forecasting waterline”, about participating in the GJP, which has 108 comments but most aren’t about the GJP
    
    a short follow-up by gwern to post 2, with 2 comments
    
    part 2 of Morendil’s “Raising the forecasting waterline” (and 22 comments)
    
    your user page, which comes up because of the parent comment
    
    a link post to an FT article on forecasting, with comments that don’t talk about the GJP
    
    the list of recent comments for LW’s Discussion section, which comes up because of the parent comment
    
    VipulNaik’s “Some historical evaluations of forecasting”, which discusses the GJP for a paragraph (the only comment doesn’t mention the GJP)
    
    the list of Discussion posts tagged “tetlock”, which matches because post 8 comes up
    
    Morendil starting a short subthread about the GJP under “The Martial Art of Rationality”
    
    post 5 at a different URL
    
    an unrelated post which only comes up because Google indexed it while my GJP-mentioning comment was in the sidebar
    
    another VipulNaik post which again discusses the GJP for a paragraph; all 3 comments talk about something else
    
    Morendil’s “Raising the waterline” mentions the GJP a few times (none of its comments do)
    
    VipulNaik’s “An overview of forecasting for politics, conflict, and political violence” lists various forecasting efforts, and discusses the GJP as one of them across several bullet points (0 comments)
    
    VipulNaik’s list of submitted posts
    
    Morendil mentioning the GJP in a one-sentence comment.
    
    VipulNaik again giving the GJP a paragraph in “Domains of forecasting” (none of the 4 comments mention the GJP)
    
    That is more commentary than I remembered (I’d definitely forgotten about Morendil’s 3 top-level posts), and yeah, “little interest” is too strong. I’ll change that to “sporadic discussion”, which I think is fair. Aside from Morendil’s posts and this G_Z post, most of the mentions of GJP on LW seem to be asides or links to external articles, and they’re spread out over about 4 years.