Scott Alexander has given his verdict on our predictions for Covid from April 2020.
This seems like an excellent opportunity to reflect on those predictions. I’ll also attempt to render my verdict on the predictions, based on the principles I discuss in Evaluating Predictions in Hindsight under Hard Mode, as Scott’s already done the Easy Mode work.
Afterwards, I’ll give my take on the Assorted Links from that post as well.
Note on Methodology
This post is where I gave my predictions. Note that I don’t give my fair probabilities here. Instead, I give what prices I’d be willing to wager at. This is an important distinction.
The coward’s method of proposing a wager is to say ‘Bob, you think this is 90% likely, I think that’s too high, so surely you will give me 9:1 odds.’ That’s nonsense, of course, because Bob previously thought those odds were fair, and now someone wants to bet against him. Why should Bob let someone pick which predictions of his someone bets against at fair odds?
The epistemic hero’s method would be to give your own fair probability, and offer to meet in the middle via the Green Knight test (you’d do the math and combine into one wager). Thus, ‘Bob, you think this is 90% likely, I think it is 10% likely, so let’s bet at even odds.’
The gambler’s method is to consider the prediction as if it were an initial value for a prediction market, and reveal how far you’d be willing to move the odds while still being comfortable wagering. That doesn’t mean you would then think the odds were fair, or that you’d be comfortable if someone wanted to bet against you on your own side at those new odds. Thus, you say something like “Bob, you say 90%, I’m willing to bet against that at 80% odds” which means you think it’s at most 80%, so you’re willing to place bets first at 90%, then 85%, then 80%, so long as cumulative size isn’t too large, but then you’ll stop and won’t place the one at 75%. If Alice comes along and says “You stopped at 80% odds, I want to bet against this at 75% odds” you might or might not accept her wager. Often you wouldn’t.
It’s not a first-best solution to (at least sort of) keep two sets of books in your head, where the odds on the game say 50%, you’re willing to bet up to 55%, but your gut tells you your side will win 70% of the time. It is still often a very good practical solution to keep both sets of books, and mostly avoid placing wagers inside the 55%-70% window. One way to think of that is you’d be at 70% as fair value if you hadn’t seen the market, but you respect the market and know you’re often wrong. But if the market wanted to move higher, you’d offer no objections.
In this case, looking at the gambler’s limit prices improves my calibration score, because I was overconfident in the overall arc of the pandemic and our reactions to it, which is why my log score was highly unimpressive. In Hard Mode, we consider everything including reasoning, so such calculations are mostly set aside either way.
I encourage you to follow along by reading my previous thoughts in the original post as you go down the list of questions, to get the full explanations, as for space reasons I’m cutting them short here.
Questions
False. The biggest mistake I made in this pandemic was vastly underestimating the strength of the control system. My explanation on this question makes that very clear. I thought it wasn’t likely that people would be willing to continue containment that didn’t make progress even until June, and I was very wrong. Should have been higher than 60%. I consider this prediction a rather large error.
False on Scott’s grading. They did relax things somewhat in between, so I’d view this as ambiguous. Under my read at the time I think there was a major (but far from total) relaxation that was then rescinded. Despite this, 10% was a really bad prediction here no matter the interpretation, there clearly was a plausible path for fully sustained lockdowns thanks to the control systems and the way California was choosing its risk tolerance levels. This needed to be at least 25%.
False. This one comes out looking correct. By the time the prediction was made this was sub-1%.
False. This was closer than I would have expected, seasonality helped a lot and I underestimated it. My guess is the right guess was a little higher than this, something like 35% or 40%.
True. I essentially say ‘I think this is higher than 90% but for various reasons it’s tough to bet things up higher than 90%.’ In hindsight this is so high a threshold it’s essentially a Can’t Happen, but given information at the time I don’t think 90% is crazy low – it’s plausible 95% is a better prediction but I don’t think more than that would have been wise.
True. I did win this one but again feels overconfident given what we knew at the time, and I bet this one up a bit too high, although China’s presumed willingness to fudge numbers matters. 85% would have been a better stopping point.
True. On reflection, giving a full 10% to ‘China has the worst death toll but officially denies it’ seems crazy high, and I’m fine with this one at 80% with only a 5% difference between the two numbers. I didn’t think hard enough about that question at the time.
True, I got this overturned ‘on appeal’ via a Twitter poll. My logic at the time goes on to explicitly say that I expect this to remain the narrative even if it’s no longer true, rather than being a prediction that almost never does another city end up getting hit harder. On reflection, indeed do many things come to pass, narratives shift in strange ways and 90% was the better prediction.
False. This was either damn close (95,963!) or not close at all because China cares about round numbers and was only going to let it cross 100k if they had no choice, and they had plenty of choice. I was surprised to see China’s case count had risen this much, and it’s kind of suspiciously close to 100k given that a lot of people have incentives not to report cases and China doesn’t want to cross round thresholds. I’m going to say that 40% was a pretty good prediction.
True! Vaccine timelines being so fast was a huge outlier. We succeeded here, but only by a few weeks, despite things going mind-blowingly right and seeing much better results than anyone expected. AstraZeneca messed up their trials but that was the only major thing that went wrong, and it’s hard to see where things could have gone much better short of some small country going rogue. I’m not sure whether 40% or 50% was the better prediction here, but that seems like a good range.
11. Best scientific consensus ends up being that hydroxychloroquine was significantly effective: 20%
Sell to 15% or so, while noting that I think the chance of it actually being effective is much higher than that.
False. I like this assessment in hindsight. We now more fully understand the extent to which The Narrative pushes hard on things like this, and would have made it exceedingly difficult for HCQ to be accepted as effective even if it was effective. It would have needed to be a game changer.
12. I personally will get coronavirus (as per my best guess if I had it; positive test not needed): 30%
Sell to 20% at least, and also what the hell?
False. Yeah, I didn’t sell enough of this, no way he was at 20% risk to catch this and his prediction here seems even siller now.
13. Someone I am close to (housemate or close family member) will get coronavirus: 60%
Sell to 40%.
False. This seems like a more reasonable place to have stopped than the 20% from #12. I want to say still a bit high, but room here for some people who might not take good precautions. Note that the 2:1 ratio here between these two questions is definitely silly and selling them both down a third indicates a mistake somewhere.
14. General consensus is that we (April 2020 US) were overreacting: 50%
15. General consensus is that we (April 2020 US) were underreacting: 20%
“General consensus will be that we were reacting stupidly. We reacted wrong. That’s an easy call. The question is, will that be widely seen as an underreaction, an overreaction, something that’s neither, or will there be a lack of consensus? What does it take to get a ‘consensus’? Who counts?
My guess is that there flat out won’t be consensus.
…so I’m going to sell the overreacting contract down to 30%, but stop there because people are bad at such things and find ways to rewrite history to suit their narratives. I’m going to hold the 20% on underreacting”
False on both, as expected there’s no consensus apart from the consensus that we were acting stupidly. Which is hard to avoid as a conclusion, given we did contradictory things in different places and at different times, and also dropped some rather important balls. But even then I’m not super confident you’d call it a ‘consensus.’ By only going down to 30% and 20%, I seem to be implying I’m not that confident in a lack of consensus, so this seems like I should have been somewhat bolder.
16. General consensus is that summer made coronavirus significantly less dangerous: 70%
True. Again consensus can be tough, so 70% seems reasonable or even a bit high. I don’t think we could be super confident in this at the time.
17. …and there is a catastrophic (50K+ US deaths, or more major lockdowns, after at least a month without these things) second wave in autumn: 30%
True. I stand by this being a substantial underdog for exactly the reason I noted, which is that this is a parlay of several things that each must happen – we need to go under the bar, then back over the bar, on multiple fronts. There were Major Lockdowns each month, but it’s not clear there were ‘more major lockdowns’ each month, so I do think technically this evaluates to true as written, but as far as intent it’s not clear to me this fully happened because it’s not clear lockdowns sufficiently lifted. Parlays are hard to win!
18. I personally am back to working not-at-home: 90%
False. Seeing this as different from the lockdown percentage seems clearly right in hindsight, and the reason I’m still too high on this is that I got the lockdown probability wrong. If we compound that 25%+ with another 10% here we get that this can be at most around 65%, and also other things can always happen so 60% seems like a more reasonable maximum.
19. At least half of states send every voter a mail-in ballot in 2020 presidential election: 20%
False. I don’t think this was ever close to happening, exactly for the reasons I laid out. Not moving markets too far when you’re anchored, especially betting on a big favorite, is always good policy.
20. PredictIt is uncertain (less than 95% sure) who won the presidential election for more than 24 hours after Election Day. 20%
True. Thinking about all the right things here, still getting to the wrong answer. Seeing accusations of fraud as unlikely looks silly in hindsight, especially given the extended willingness of the market to stay insane long after the verdict was in. You can say it should have been under 95% for 24 hours, but you can’t say that for a month out. So that reasoning wasn’t great. The question is how close the election had to actually be in terms of tipping point margin of victory, in order to allow it to be thrown into question, in each direction, and thus how likely such a result was. The tipping point state was Biden+0.6%. My guess is that Biden+3% or Trump+1% would have settled things quickly. That range seems more like a 30% shot than a 10% or 20% shot, but also we got a very large amount of fraud accusation versus reasonable priors in April even given that we should have expected a lot more fraud accusations than people did expect. I think 20%-25% seems like the right range in hindsight.
I also looked at Scott’s non-Covid predictions, but we’ll postpone looking at those until Scott grades them.
Appendix: Metaculus Monday Links Thoughts
Hypermind looks promising at first glance.
Vitalik Buterin talks about his adventures winning $50,000 betting against Trump on Ethereum prediction market Augur.
My takeaway from Vitalik’s journey is that it took $50,000 worth of time and technical expertise to make that $50,000, and the only reason it made sense for Vitalik to do it was because of the value of reporting on the results and in learning by doing. Essentially this is a start-up style operation to see if things can scale, and even with the completely insane market and uniquely huge event the opportunity size wasn’t great. Perhaps in 2024 such things will be ready for prime time but for now I would treat the operational risks involved as bigger than the profit margins offered.
What’s most weird to me is that the various prediction markets stayed in line with each other, despite very different participation restrictions and costs of doing arbitrage. My guess is that a lot of people involved were not thinking about real costs and effective odds, rather thinking about whether the market prices lined up and were ‘fair’ in some sense.
This week on Metaculus: will a third-party candidate win 5%+ of the popular vote in 2024? Users say 15% chance
Scott is betting against. I’m not. Not only Perot in ’92 and ‘96 but Anderson in ‘80 broke this barrier for 3 of the last 11, there are plausible known routes to this in ’24 (e.g. Trump as 3rd party, Trump (or someone in his image) as Republican causing a third party run, Libertarians running Amash or Romney, or a proper run somehow by Kanye West or another billionaire, or even a Warren/Sanders style scorched earth campaign on the left if Biden runs again). Hell, the way things are weirding who knows who will run. If anything I’d be at 20% rather than 15%.
Also, will Bitcoin outperform the US stock market over the next five years, at 51%. I started out thinking – of course it’s 50-50! By the efficient market hypothesis, if any asset was obviously going to do better than another, people would change the price until it wasn’t. But on second thought that’s wrong – stocks have a higher than 50% chance of beating treasuries over the same period because of a risk premium. Maybe there’s no intuitive way to think about this, you have to have opinions on the underlying fundamentals, and it’s only 51% by coincidence?
It’s at exactly 50⁄50 now! I quoted Scott in full above to ensure I fully represent his thinking here. Looking at this market tricked me into trying to put in a prediction, despite that then putting a burden on me (in theory anyway) to update that prediction continuously or lose points, but it said I was Forbidden to do that, so I didn’t.
This does not reflect well on Metaculus. The 50% number is crazy, or at a minimum, it represents a very strong rejection of the Efficient Market Hypothesis, and would make Bitcoin what is known as a Screaming Buy.
This comes up every time I see Bitcoin price distribution predictions. Bitcoin can only go as low as $0. Bitcoin could, in theory, go up not only to $100k but to $1 million or more.
If the Bitcoin distribution centers on the same place as the stock market, it is a screaming buy compared to the stock market, and you should put a substantial portion of your net worth into BTC.
If Bitcoin is priced efficiently now, then that implies it is more likely to fall than rise, even with a substantial risk premium, because that’s the only way for the math to come out even. The alternative is to both think BTC is priced fairly and that almost none of that value is the potential to rocket to the moon, which doesn’t seem right to me at all. If you think BTC can’t rocket, BTC is a bad buy.
Perhaps Metaculus thinks Bitcoin is indeed a screaming buy. That’s not a crazy thing to think. But if it does think that, it seems like an awfully big coincidence that this landed on exactly 50%.
There’s no trade, since (as many people reminded me) Metaculus is not a prediction market and you can’t trade on its values, but there’s still a big contradiction with market prices here.
None of this is investment advice in any way, but: My model of BTC at the moment is that expected returns for holding BTC are positive, in excess of its fair risk premium, but that in a large majority of worlds it will be outperformed by the stock market, often that involves very large declines, and also you have to account for tax liability and the chance someone will steal your bitcoins.
The weekly Covid update will be posted on Thursday as per usual.
I was going to write up my thoughts on this but it would be easier to just comment here.
I agree with your assessments for almost all of these. I was most impressed by your understanding of the politics in Q9 & 11 (China and Hydroxychloroquine) and the predicting the lack of consensus for Q14 & 15.
A couple where I have a question:
1. On 6⁄7 (US highest toll official & unofficial) I had a bit more probability on Brazil (similar to India, more than China) – given large population (2/3rds US) and approach of the government.
Regarding official vs unofficial, you only mention deliberate lying but I had more expectation of insufficient / bad testing hiding true amounts than lying. According to WSJ Russia’s excess deaths are 4.8x higher than their official deaths (compared to 1.7x for US). This isn’t enough to overtake the US but I think this gives an idea of the scale of the potential problem. Mexico’s excess deaths are higher than Brazil’s despite having 35% fewer official cases. (India isn’t included in those numbers—excess deaths stats aren’t available I think).
Does that change your mind as to what a good prediction would have been?
2. On q17 (second wave) your prediction for p(17|16) is ~29%. Given that we are in a world where there is a general consensus that summer made things less bad, 29% seems low for a second wave even given the difficult operationalisation? My corresponding number was 50% which still seems better to me (although I messed up q16 so we actually predicted the same for 17 itself). In terms of which way it resolves, I think just numbers of deaths resolves this as clearly true (assuming by Autumn we mean 22 Sep – 21 Dec), both in terms of official result and intent:
Was there a second wave in Autumn? Yes, in late Autumn running into early Winter.
Russia’s higher death toll might or might not be mostly Covid, but I figured its population wasn’t high enough. Even if all of Russia gets it, they’d need a pretty high fatality rate to catch us given what was likely happening here. Brazil similarly I figured wouldn’t document all that well and had a smaller pop.
2. I still think that there’s enough different ways this can fail that 30% is reasonable, and I dunno where the 29% comes from here? Presumably it would be higher than the 30% baseline for p(17|16), what am I missing? (And the way it resolves to false is if we say it’s a third wave that happened rather than a second, not that the numbers don’t match, and I agree that this is wrong and it resolves to true).
Yes, I agree Russia was unlikely to be above US for population reasons, I mentioned them more as an example of how bad under-reporting can be—I can’t think of a way other than Covid to get 147k unaccounted for excess deaths but I could be missing something. I had concerns about this in all 3 of China, India and Brazil (although I guess there’s the chance that we wouldn’t get (accurate) excess deaths numbers anyway). 85% for 6 seems right but only dropping 5% for 17 seems low.
A commenter on Scott’s post has made a case for India deaths being higher than US (enough to convince Scott it seems).
p(17|16) = p(17) / p(16) = 0.2 / 0.7 ~ 0.29 (as p(17|¬16) = 0)
Its possible / likely that I’m still missing how difficult it is to win a parlay but:
Given Covid is seen as seasonal by the end of the year, there was very likely some wave in Autumn—the main question is whether it meets the conditions set out in 17
At the time of prediction it seemed almost certain that we would get below the thresholds with the next month or two
I expected (but wasn’t certain) that a second wave would take us back above one of those thresholds.
There remains the question of having a wave in the middle (Autumn wave is therefore not second wave). This was somewhere that my model was expecting a profile in the US more like what happened in the UK/Europe where cases/deaths were at a very low level for most of the Summer. This is a common thread in a few of my other predictions about US numbers—I generally underpredicted slightly but noticeably and this was a significant cause for that. So yeah, definitely an oversight from me in that regards.
Aaah this was so much fun! I learned a lot from you looking back at your thinking from 12 months ago and seeing what you’ve learned since. This is one of my favorite posts in a while (along with this, incidentally, which I recommend everyone read because it was great).
> My takeaway from Vitalik’s journey is that it took $50,000 worth of time and technical expertise to make that $50,000
My key takeaway here, besides the technical expertise required, was that in terms of capital requirements, current prediction market designs make it very easy to push a probability away from 0 or 1, and very hard to push towards it. (Vitalik even responded to this experience with a prediction market design that does not have this problem. Maybe something will come of this.)
Anyway, Vitalik needed a capital of ethereum worth ~1 million dollars to put DAI worth $~300k into the market, for which he got $~50k profit, while those bidding on the other side only had to put in those $~50k. Assuming they had pursued the same DAI-based strategy as Vitalik, which seemed to require holding 3x of the bet amount in ethereum, they still would’ve only needed $~150k in ethereum.
I want to crowdsource crypto predictions amongst the rationalists because it’s surely better than nothing. Maybe something anonymous. Any thoughts on obvious things to do when doing such a thing?
I’d be interested in a post like this: https://www.lesswrong.com/posts/jcCMsg46RhcZJrTxP/survey-on-cortical-uniformity-an-expert-amplification
Two notes:
If you polled only people who saw your Twitter poll, that is a highly biased group, in that everyone polled had heard of you, and probably had heard your reporting on how bad it was in New York. A huge number of people did not actually hear that news in the first place, so would be highly unlikely to think NY was hardest hit.
The second wave happened around June, which isn’t autumn. The third wave started in Autumn, but didn’t peak until Winter. So I would have rated that one False on all counts for getting the timing wrong.
Plausible that it was somewhat biased, and I would be happy to have others run the same poll with a different group to verify the result, but I do think it establishes that the result is at least ambiguous.
That’s quite the nitpick. I like it. Technically, yes it did say ‘second’ and as I noted, parlays are really hard to win, but I didn’t interpret the word second there as doing enough work, slash there’s dispute over whether the middle wave counts as a wave. I think it’s better to take my licks here.
I’m confused. In theory, $50k currently invested in VTI could also go to any of those values. Is there something I’m missing about the relative likelihood of different outcomes that would make Bitcoin the more attractive investment? I feel like there’s some Econ 101 lesson I’m forgetting here.
In this case, isn’t the trade to just use the info Metaculus provides to inform your trades elsewhere? In a way, that’s an advantage of having Metaculus in addition to money-based prediction markets—predictors at money-based vs. points-based prediction markets have different motivations for predicting, so they’re likely to be self-selected from different populations and may generate different, complementary predictions. Granted, for any individual question it would be easier to be able to trade directly in the money-based market, but I think there’s an overall benefit in having both types available.
Well, no. If you can’t trade directly in a way that moves the price, it’s likely that OTHERS also have information that it’s wrong and couldn’t move the price. So the market information is far more likely to be correct than the Metaculus information.
If you can’t middle, it’s not arbitrage.
Yeah, I could do with more explanation here too. I see that ‘EMH implies 50-50 odds’ is clearly false, and not only because of the risk premium. And I see why bitcoin could be a great buy with a 50% chance of outperforming the stock market. But I don’t see why it obviously would be.
It definitely seems more volatile, but why couldn’t a sensible person judge that it is:
much more likely than the stock market to crater to ~0
more likely than the stock market to rise dramatically
only negligibly (if at all) more likely than the stock market to rise to insanely high levels
and that this all nets out to a ~50% chance of outperforming the stock market, and also an EV similar to (or less than) that of an index fund?