The fact that they’re measuring accuracy in a pretty bad way is evidence against them having a good algorithm.
Here’s Anthony Aguirre (Metaculus) and Julia Galef on Rationally Speaking.
Anthony: On the results side, there’s now an accrued track record of a couple of hundred predictions that have been resolved, and you can just look at the numbers. So, that shows that it does work quite well.
Julia: Oh, how do you measure how well it works?
Anthony: There’s a few ways — going from the bad but easy to explain, to the better but harder to explain…
Julia: That’s a good progression.
Anthony: And there’s the worst way, which I won’t even use — which is just to give you some examples of great predictions that it made. This I hate, so I won’t even do it.
Julia: Good for you for shunning that.
Anthony: So looking over sort of the last half year or so, since December 1st, for example… If you ask for how many predictions was Metaculus on the right side of 50% — above 50% if it happened or below 50% if it didn’t happen — that happens 77 out of 81 times the question resolved, so that’s quite good.
And some of the aficionados will know about Brier scores. That’s sort of the fairly easy to understand way to do it, which is that you assign a zero if something doesn’t happen, and a one if something does happen. Then you take the difference between the predicted probability and that number. So if you predict at 20% and it didn’t happen, you’d take that as a .2, or if it’s 80% and it does happen and that’s also a .2, because it’s a difference between the 80% and a one, and then you square that number.
So Brier scores can run from basically zero to one, where low numbers are good. And if you calculate that for that same set of 80 questions, it’s .072, which is a pretty good score.
The fact that they’re measuring accuracy in a pretty bad way is evidence against them having a good algorithm.
Here’s Anthony Aguirre (Metaculus) and Julia Galef on Rationally Speaking.