Metaforecast update: Better search, capture functionality, more platforms.

Metaforecast is a search tool for probabilities. Since the last public update, we have made several improvements.

tl;dr:

  • Search is much better, also faster

  • Capture functionality now makes it more convenient to save predictions as images.

  • There is now a database open to the public.

  • Questions now have quality indicators like number of forecasters, number of forecasts, volume traded, liquidity, etc.

  • Added new platforms: Kalshi, Betfair, Rootclaim and CoupCast

  • Website initial load is faster

The rest of the post outlines the details of the above improvements. However, we expect that most people will just find it more fun to try it out or to see all Metaforecast COVID predictions in a LessWrong post.

Details

Capture forecasts to use in blog posts

By clicking on “Capture” and then on “Capture image and generate code”, you can create an image out of any question, and save it on imgur. The Markdown and HTML snippets to display the captured image—which include a link back to the original question—are automatically generated.

This makes it easier to add questions to documents, e.g., LessWrong posts. This is equivalent to just taking a screenshot, but perhaps faster and more convenient, so people might actually use it to add probabilities to their own posts.

Imgur has rate limits for each IP of around 50 images per hour, which sounds reasonable. Note that these are per user, rather than for Metaforecast as a whole, so feel free to experiment with it.

The alternative to this would be a “true embed”, where question images are generated on the fly by a server, and updated with the passage of time as the crowd forecast changes. This would have some advantages, but would require further effort.

Initially, Metaforecast used a custom search script on top of Fuse.js, an open source fuzzy-search library. This was simple to implement, but resulted in a search that was fairly slow and suboptimal. We switched to Algolia, which has built in support for synonyms, plurals, removing stop words, and generally some very nice default search capabilities. It also does some indexing in the background so that searches for more complex queries aren’t noticeably slower.

Personally, I’m sad to move to a non-open source option, but Algolia is very noticeably better. It’s also free for under 10k records, which Metaforecast fits under. ElasticSearch is an open source alternative, but it has a reputation for being more complex, so I shied away from it.

Faster initial page load

To avoid having a server always running in the background to deal with searches, the whole database was being loaded together with the webpage using Nextjs’s static site optimization functionality. Now that the app is using Algolia, this isn’t needed anymore, and the initial page load feels faster.

Prediction question quality indicators

We’ve added some more quality indicators when the platforms themselves make them available. These are particularly useful for prediction markets, where trade volume feels particularly informative.

Database download support

[note 2022/​04: Metaforecast now has a GraphQL interface, at: metaforecast.org/​api/​graphql , which replaces this rudimentary functionality]

There is now a database for questions, which is open for other people to download. The code to do this (which requires node and npm) is:

$ git clone https://​​github.com/​​QURIresearch/​​metaforecasts
$ cd metaforecasts
$ npm install
$ node src/​utils/​manualDownload.js

This downloads the metaforecast database as “metaforecasts.json”, using a public database key. Ought’s Elicit has used that for their “search forecasting questions task”. I’m also saving each day’s snapshot to a history database, which should become more meaningful as time goes on.

An example entry in metaforecasts.json might look like:

{
  "title": "Will more than 2.5 million people travel through a TSA checkpoint on any day on or before December 31?",
  "url": "https://​​polymarket.com/​​market/​​will-more-than-2pt5-million-people-travel-through-a-tsa-checkpoint-on-any-day-on-or-before-december-31″,
  “platform”: “PolyMarket”,
  “description”: “This is a market on whether more than 2.5 million people will travel through a TSA checkpoint on [...]”,
  “options”: [
    {
      “name”: “Yes”,
      “probability”: “0.4516[...]”,
      “type”: “PROBABILITY”
    },
    {
      “name”: “No”,
      “probability”: “0.5483[...]”,
      “type”: “PROBABILITY”
    }
  ],
  “timestamp”: “2021-08-13T19:17:27.746Z”,
  “qualityindicators”: {
    “numforecasts”: “12″,
    “liquidity”: “5807.70″,
    “tradevolume”: “540.73″,
    “stars”: 4
  },
  “optionsstringforsearch”: “Yes, No”
}

This is not an immutable schema, and might vary by platform. For instance, Kalshi might have a “spread” property as a quality indicator. It is also liable to change in the future. For example, I might change variables to snake or camel case for ease of readability.

New Forecasting Platforms

Metaforecast now includes predictions from Kalshi, Betfair, Rootclaim, and CoupCast, and has better fetching for Hypermind, which was previously a pain point. With these, Metaforecast now has a bit over 3.5k questions to search from.