It’s not a novel algorithm type, just a learning project I did in the process of learning ML frameworks, a fairly simple LSTM + one dense layer, trained on the predictions + resolution of about 60% of the resolved predictions from PredictionBook as of September last year (which doesn’t include any of the ones in the contest). The remaining resolved predictions were used for cross-validation or set aside as a test set. An even simpler RNN is only very slightly less good, though.
The details of how the algorithm works are thus somewhat opaque but from observing the way it reacts to input, it seems to lean on the average, weight later in sequence predictions more heavily (so order matters) and get more confident with number of predictions, while treating the propositions with only one probability assignment as probably being heavily overconfident. It seems to have more or less learnt that insight Tetlock pointed out on its own. Disagreement might also matter to it, not sure.
It’s on GitHub at https://github.com/jbeshir/moonbird-predictor-keras; this doesn’t include the data, which I downloaded using https://github.com/jbeshir/predictionbook-extractor. It’s not particularly tidy though, and still includes a lot of unused functionality for input features- the words of the proposition, the time between a probability assignment and the due time, etc- which I didn’t end up using because the dataset was too small for it to learn any signal in them.
I’m currently working on making the online frontend to the model automatically retrain the model at intervals using freshly resolved predictions, mostly for practice building a simple “online” ML system before I move on to trying to build things with more practical application.
The main reason I ran figures for it against the contest was that some of its individual confidences seemed strange to me, and while the cross-validation stuff was saying it was good, I was suspicious I was getting something wrong in the process.
PredictionBook itself has a bunch more than three participants and functions as an always-running contest for calibration, although it’s easy to cheat since it’s possible to make and resolve whatever predictions you want. I also participate in GJ Open, which has an eternally ongoing prediction contest. So there’s stuff out there where people who want to compete on running score can do so.
The objective of the contest was less to bring such an opportunity into existence as to see if it’d incentivise some people who had been “meaning” to practice prediction-making and not gotten to it yet to do so on on one of the platforms, by offering a kind of “reason to get around to it now”; the answer was no, though.
I don’t participate much on Metaculus because for my actual, non-contest prediction-making practice, I tend to favour predictions that resolve within about six weeks, because the longer the time between prediction and resolution, the slower the iteration process on improving calibration; if I predict on 100 things that happen in four years, it takes four years for me to learn if I’m over or under confident at the 90% or so mark, and then another four years for me to learn if my reaction to that was an over or under reaction. Metaculus seems to favour predictions 2-4 or more years out, and requires sticking with private predictions to create your own short term ones in number, which is interesting for getting a crowd read on the future, but doesn’t offer me so much of an opportunity to iterate and improve. It’s a nice project, though.