wb comments on Prediction Contest 2018: Scores and Retrospective

wb 28 Jan 2019 0:34 UTC
1 point
What is the Moonbird algorithm? Why do you call it ML? because algorithm won’t tell me much without the data used to train it?
The algorithm that first springs to mind is to treat every number of predictions separately and apply kNN (in logistic space?). Better: if there are N predictions, average over the kNN applied to every one of the 2^N subsets of the predictions. Maybe weight by how well trained the different lengths are.
Tetlock tells us that although individuals are overconfident, crowds are underconfident, so once we’ve averaged, we should shift away from 0.5. This helps a bit in this case, but Moonbird does a lot better. I guess it’s increasing confidence when the crowd is agreed and decreasing confidence when the crowd disagrees.
- jbeshir 28 Jan 2019 6:11 UTC
  2 points
  Parent
  It’s not a novel algorithm type, just a learning project I did in the process of learning ML frameworks, a fairly simple LSTM + one dense layer, trained on the predictions + resolution of about 60% of the resolved predictions from PredictionBook as of September last year (which doesn’t include any of the ones in the contest). The remaining resolved predictions were used for cross-validation or set aside as a test set. An even simpler RNN is only very slightly less good, though.
  The details of how the algorithm works are thus somewhat opaque but from observing the way it reacts to input, it seems to lean on the average, weight later in sequence predictions more heavily (so order matters) and get more confident with number of predictions, while treating the propositions with only one probability assignment as probably being heavily overconfident. It seems to have more or less learnt that insight Tetlock pointed out on its own. Disagreement might also matter to it, not sure.
  It’s on GitHub at https://github.com/jbeshir/moonbird-predictor-keras; this doesn’t include the data, which I downloaded using https://github.com/jbeshir/predictionbook-extractor. It’s not particularly tidy though, and still includes a lot of unused functionality for input features- the words of the proposition, the time between a probability assignment and the due time, etc- which I didn’t end up using because the dataset was too small for it to learn any signal in them.
  I’m currently working on making the online frontend to the model automatically retrain the model at intervals using freshly resolved predictions, mostly for practice building a simple “online” ML system before I move on to trying to build things with more practical application.
  The main reason I ran figures for it against the contest was that some of its individual confidences seemed strange to me, and while the cross-validation stuff was saying it was good, I was suspicious I was getting something wrong in the process.