I agree most of your methods for evaluating predictions are good. But I think I mostly have a different use case, in two ways. First, for a lot of things I’m not working off an explicit model, where I can compare predictions made to the model to reality in many different circumstances. When I give Joe Biden X% of the nomination, this isn’t coming from a general process that I can check against past elections and other candidates, it’s just something like “Joe Biden feels X% likely to win”. I think this is probably part of what you mean by hard mode vs. easy mode.
Second, I think most people who try to make predictions aren’t trying to do something that looks like “beat the market”. Accepting the market price is probably good enough for most purposes for everyone except investors, gamblers, and domain experts. For me the most valuable type of prediction is when I’m trying to operate in a field without a market, either because our society is bad at getting the right markets up (eg predicting whether coronavirus will be a global pandemic, where stock prices are relevant but there’s no real prediction market in it) or because it’s a more personal matter (eg me trying to decide whether I would be happier if I quit my job). Calibration is one of the few methods that works here, although I agree with your criticisms of it.
I’m not sure we disagree on Silver’s Trump production and superforecasters’ Brexit prediction. I agree they did as well as possible with the information that they had and do not deserve criticism. We seem to have a semantic disagreement on whether a prediction that does this (but ascribes less than 50% to the winning side on a binary question) should be called “intelligently-made but wrong” or “right”. I’m not really committed to my side of this question except insofar as I want to convey information clearly.
I’m not sure it’s possible to do the thing that you’re doing here, which is to grade my predictions (with hindsight of what really happened) while trying not to let your hindsight contaminate your grades. With my own hindsight, I agree with most of your criticisms, but I don’t know whether that’s because you have shown me the error of my ways, or because Scott-with-hindsight and Zvi-with-hindsight are naturally closer together than either of us is to Scott-without-hindsight (and, presumably, Zvi-without-hindsight).
A few cases where I do have thoughts—one reason I priced Biden so low was that in December 2018 when I wrote those it was unclear whether he was even going to run (I can’t find a prediction market for that month, but prediction markets a few months later were only in the low 70s or so). Now it seems obvious that he would run, but at the time you could have made good money on InTrade by predicting that. My Biden estimate was higher than the prediction market’s Biden estimate at that time (and in fact I made lots of money betting on Biden in the prediction markets in January 2019 ), so I don’t think I was clearly and egregiously too low.
Same with Trump being the GOP nominee. I agree now it seems like it was always a sure thing. But in late 2018, he’d been president for just under two years, it was still this unprecedented situation of a complete novice who offended everyone taking the presidency, we were in the middle of a government shutdown that Trump was bungling so badly that even the Republicans were starting to grumble, and the idea of GOP falling out of love with Trump just felt much more plausible than it does now. It’s possible this was still silly even in late 2018, but I don’t know how to surgically remove my hindsight.
I will defend my very high confidence on Trump approval below 50, based on it never having gotten above 46 in his presidency so far. While I agree a 9-11 scale event could change that, that sort of thing probably only happens once every ten years or so. Trump got a boost from a rally-round-the-flag effect around COVID, and it was clearly bigger than any other boost he’s gotten in his administration, but it only took him up to 45.8% or so, so even very large black swans aren’t enough. The largest boost Obama got in his administration, after killing Osama, was only 5 points above baseline, still not enough for Trump to hit 50. And it wouldn’t just require an event like this to happen, but to happen at exactly the right time to peak on 1/1/2020.
May staying in power feels wrong now, but she had beaten Labour recently enough that she didn’t have to quit if she didn’t want to, she had survived a no-confidence motion recently enough that it would have been illegal to no-confidence her again until December (and it probably wouldn’t happen exactly in December), and she had failed badly many times before without resigning. So I figured she wasn’t interested in resigning just because Brexit was hard, and nobody else could kick her out against her will, so she would probably stay in power. I guess she got tired of failing so many times. You were right and I was wrong, but I don’t think you could have (or should have be able to) convinced me of that last year.
Good responses. I do think a lot of the value is the back-and-forth, and seeing which logic holds up and which doesn’t. Bunch of things to talk about.
First, the discussion of models vs. instincts. I agree that one should sometimes make predictions without an explicit model. I’m not sure whether one can be said to ever not have an implicit model and still be doing the scribe things instead of the actor thing—my modal thinks that when someone like me makes a prediction on instinct there’s an implicit (unconscious) model somewhere, even if it’s bad and would be modified heavily or rejected outright on reflection by system 2.
I do think ‘internal consistency at a given time’ is a valid check on instinctive predictions, perhaps even the best go-to. It’s a way to turn instincts into a rough model slash check to see if your instincts make any sense slash find out what your instincts actually are. It also checks for a bunch of bias issues (e.g. the feminist bank teller thing often becomes obvious even if it was subtle).
Agree that it’s good to predict more in fields without markets rather than with markets. One could explicitly not look at markets until predictions are made; I definitely did that often. It is helpful.
I think the “right” versus “intelligently-made but wrong” thing is at least important semantics. In our society, telling someone they were “wrong” versus “right” is a big deal. At least most people will get wrong impressions of what’s going on if you say that Scott Adams saying (as he explicitly did) 98% Trump in May 2016 “was right” as a baseline. And that happens! They think that should be considered good predicting, because you were super confident and it happened. Or that scene in Zero Dark Thirty, where the woman says “100%” that Osama is where she thinks she is, because that’s how you sound confident in a meeting. If you correct solve the question “what is the probability of X given everything we know now?” and say 75% and then X doesn’t happen, but 75% was the best guess you could have made at the time, I think saying you are “wrong” is both incorrect and going to do net harm. It’s not enough to rely on someone’s long term score, because most people don’t get one, and it would get ignored most of the time even if they did have one.
Biden markets were indeed dumb early on if your report is right, and I missed that boat because I wasn’t thinking about it—I only got into the PredictIt game this time around when Yang got up into the 8% range and there was actual free money. I don’t think it was inevitable he would run but you definitely made an amazing trade. 70% to run plus dominating the pools does not equal 15%! Especially given that when the 70% event occurred, his value more than doubled.
That’s another good metric for evaluating trades/prediction that I forgot to include more explicitly. Look at the new market prices slash probability estimates after an event, and see what that implies about the old prediction. In this case, clearly it says that 15% was stupidly low. I like to think I too would have done that trade if I’d thought about it at the time, maybe even sold other candidates to get more of it, and looked at the general election prices. In hindsight it’s clear 20% is still way too low, but a much smaller mistake and certainly more understandable.
I agree that removing hindsight can be tough. I do think that it is now clear that e.g. Trump not getting nominated would have been extraordinarily hard to have happen without a health issue, but did we have enough information for that? I think we mostly did? But I can’t be sure I’m playing fair here, either.
The 50% approval thing I do think we had unusually uneventful times until Covid-19. Covid-19 put Trump at 48.5% and let’s face it, he had to try really hard to not break 50%, but he did manage it. Wasn’t easy, team effort.
May it seemed to me (at the time) like would keep going until failure but would quit on actual failure, but again hindsight is always a concern.
Also points to, might be good in general to write down basic reasoning when making predictions, to help prevent hindsight bias. And also if you get the right answer for the wrong reasons, in important senses, you can still mark yourself wrong in ways that let one improve.
Thanks (as always) for your thoughts.
I agree most of your methods for evaluating predictions are good. But I think I mostly have a different use case, in two ways. First, for a lot of things I’m not working off an explicit model, where I can compare predictions made to the model to reality in many different circumstances. When I give Joe Biden X% of the nomination, this isn’t coming from a general process that I can check against past elections and other candidates, it’s just something like “Joe Biden feels X% likely to win”. I think this is probably part of what you mean by hard mode vs. easy mode.
Second, I think most people who try to make predictions aren’t trying to do something that looks like “beat the market”. Accepting the market price is probably good enough for most purposes for everyone except investors, gamblers, and domain experts. For me the most valuable type of prediction is when I’m trying to operate in a field without a market, either because our society is bad at getting the right markets up (eg predicting whether coronavirus will be a global pandemic, where stock prices are relevant but there’s no real prediction market in it) or because it’s a more personal matter (eg me trying to decide whether I would be happier if I quit my job). Calibration is one of the few methods that works here, although I agree with your criticisms of it.
I’m not sure we disagree on Silver’s Trump production and superforecasters’ Brexit prediction. I agree they did as well as possible with the information that they had and do not deserve criticism. We seem to have a semantic disagreement on whether a prediction that does this (but ascribes less than 50% to the winning side on a binary question) should be called “intelligently-made but wrong” or “right”. I’m not really committed to my side of this question except insofar as I want to convey information clearly.
I’m not sure it’s possible to do the thing that you’re doing here, which is to grade my predictions (with hindsight of what really happened) while trying not to let your hindsight contaminate your grades. With my own hindsight, I agree with most of your criticisms, but I don’t know whether that’s because you have shown me the error of my ways, or because Scott-with-hindsight and Zvi-with-hindsight are naturally closer together than either of us is to Scott-without-hindsight (and, presumably, Zvi-without-hindsight).
A few cases where I do have thoughts—one reason I priced Biden so low was that in December 2018 when I wrote those it was unclear whether he was even going to run (I can’t find a prediction market for that month, but prediction markets a few months later were only in the low 70s or so). Now it seems obvious that he would run, but at the time you could have made good money on InTrade by predicting that. My Biden estimate was higher than the prediction market’s Biden estimate at that time (and in fact I made lots of money betting on Biden in the prediction markets in January 2019 ), so I don’t think I was clearly and egregiously too low.
Same with Trump being the GOP nominee. I agree now it seems like it was always a sure thing. But in late 2018, he’d been president for just under two years, it was still this unprecedented situation of a complete novice who offended everyone taking the presidency, we were in the middle of a government shutdown that Trump was bungling so badly that even the Republicans were starting to grumble, and the idea of GOP falling out of love with Trump just felt much more plausible than it does now. It’s possible this was still silly even in late 2018, but I don’t know how to surgically remove my hindsight.
I will defend my very high confidence on Trump approval below 50, based on it never having gotten above 46 in his presidency so far. While I agree a 9-11 scale event could change that, that sort of thing probably only happens once every ten years or so. Trump got a boost from a rally-round-the-flag effect around COVID, and it was clearly bigger than any other boost he’s gotten in his administration, but it only took him up to 45.8% or so, so even very large black swans aren’t enough. The largest boost Obama got in his administration, after killing Osama, was only 5 points above baseline, still not enough for Trump to hit 50. And it wouldn’t just require an event like this to happen, but to happen at exactly the right time to peak on 1/1/2020.
May staying in power feels wrong now, but she had beaten Labour recently enough that she didn’t have to quit if she didn’t want to, she had survived a no-confidence motion recently enough that it would have been illegal to no-confidence her again until December (and it probably wouldn’t happen exactly in December), and she had failed badly many times before without resigning. So I figured she wasn’t interested in resigning just because Brexit was hard, and nobody else could kick her out against her will, so she would probably stay in power. I guess she got tired of failing so many times. You were right and I was wrong, but I don’t think you could have (or should have be able to) convinced me of that last year.
Good responses. I do think a lot of the value is the back-and-forth, and seeing which logic holds up and which doesn’t. Bunch of things to talk about.
First, the discussion of models vs. instincts. I agree that one should sometimes make predictions without an explicit model. I’m not sure whether one can be said to ever not have an implicit model and still be doing the scribe things instead of the actor thing—my modal thinks that when someone like me makes a prediction on instinct there’s an implicit (unconscious) model somewhere, even if it’s bad and would be modified heavily or rejected outright on reflection by system 2.
I do think ‘internal consistency at a given time’ is a valid check on instinctive predictions, perhaps even the best go-to. It’s a way to turn instincts into a rough model slash check to see if your instincts make any sense slash find out what your instincts actually are. It also checks for a bunch of bias issues (e.g. the feminist bank teller thing often becomes obvious even if it was subtle).
Agree that it’s good to predict more in fields without markets rather than with markets. One could explicitly not look at markets until predictions are made; I definitely did that often. It is helpful.
I think the “right” versus “intelligently-made but wrong” thing is at least important semantics. In our society, telling someone they were “wrong” versus “right” is a big deal. At least most people will get wrong impressions of what’s going on if you say that Scott Adams saying (as he explicitly did) 98% Trump in May 2016 “was right” as a baseline. And that happens! They think that should be considered good predicting, because you were super confident and it happened. Or that scene in Zero Dark Thirty, where the woman says “100%” that Osama is where she thinks she is, because that’s how you sound confident in a meeting. If you correct solve the question “what is the probability of X given everything we know now?” and say 75% and then X doesn’t happen, but 75% was the best guess you could have made at the time, I think saying you are “wrong” is both incorrect and going to do net harm. It’s not enough to rely on someone’s long term score, because most people don’t get one, and it would get ignored most of the time even if they did have one.
Biden markets were indeed dumb early on if your report is right, and I missed that boat because I wasn’t thinking about it—I only got into the PredictIt game this time around when Yang got up into the 8% range and there was actual free money. I don’t think it was inevitable he would run but you definitely made an amazing trade. 70% to run plus dominating the pools does not equal 15%! Especially given that when the 70% event occurred, his value more than doubled.
That’s another good metric for evaluating trades/prediction that I forgot to include more explicitly. Look at the new market prices slash probability estimates after an event, and see what that implies about the old prediction. In this case, clearly it says that 15% was stupidly low. I like to think I too would have done that trade if I’d thought about it at the time, maybe even sold other candidates to get more of it, and looked at the general election prices. In hindsight it’s clear 20% is still way too low, but a much smaller mistake and certainly more understandable.
I agree that removing hindsight can be tough. I do think that it is now clear that e.g. Trump not getting nominated would have been extraordinarily hard to have happen without a health issue, but did we have enough information for that? I think we mostly did? But I can’t be sure I’m playing fair here, either.
The 50% approval thing I do think we had unusually uneventful times until Covid-19. Covid-19 put Trump at 48.5% and let’s face it, he had to try really hard to not break 50%, but he did manage it. Wasn’t easy, team effort.
May it seemed to me (at the time) like would keep going until failure but would quit on actual failure, but again hindsight is always a concern.
Also points to, might be good in general to write down basic reasoning when making predictions, to help prevent hindsight bias. And also if you get the right answer for the wrong reasons, in important senses, you can still mark yourself wrong in ways that let one improve.