Promoting Prediction Markets With Meaningless Internet-Point Badges
Intro
(x-post)
I’d like to live in a world where prediction-market use is common and high-prestige.
The easiest way for this to happen is for prediction markets with money to be legal.
In the absence of this, there might nevertheless be some potential low-hanging fruit for a point-based prediction market—Metaculus, or some unidentified contender—to promote the wide acceptance of prediction markets. The same action might also improve the general quality of journalism, potentially.
Proposal
The prediction market creates a new feature. The feature allows a user of the market to create a small badge, displayable on the user’s blog, Medium, Substack, or elsewhere, that displays the person’s username and a score measuring the accuracy of their predictions.
The score could be an absolute measure such as Brier score, or a relative measure such as the percentile that the person occupies within the market. It could also be colored according to the number of predictions the person has made; or it could have a tag indicating that this accuracy only obtains within a particular subject or field; or it could indicate the time horizon with which they typically make predictions; generally, there are numerous addendums that could be added.
All of the above details are important, but for the moment I put them to the side.
The badge could be displayed at the head of every article by the author, potentially.
Codewise, this would work similarly to any front-end widget managed by another server, i.e., like a commenting system, like a Twitter embed, and so on and so forth. So of course it would update live as the predictions by the author came true / did not come true, even on older articles.
(This badge could be supplemented, of course, by embedded questions from prediction markets, which could be placed in articles. Metaculus already has these.)
Possible Points in Favor
People are tired of shitty media. There’s an enormous groundswell of media distrust from many angles, as far as I can tell. A measure like this is easy to understand, at least in the basics, and provides clear evidence of credibility for those who use it, entirely independent of trust.
It also evens the credibility playing-field between individuals and large agencies, which could be popular.
People like little badges if they grant status. If the first users of this are sufficiently high-prestige, or if predictions / articles made by users of this badge gain fame, then many people will want this. (After all, people wanted to get the Twitter verified badge, right?) This could lead more people to the prediction market, which would be good.
Tying narrative to numbers helps broad acceptance of prediction markets. Prediction markets are great, but prediction markets are not stories, and people love stories. Having people write journalist-y narratives within the context of their personal predictions could then make prediction markets more popular, while also constraining said people to attend more carefully to the truth.
Possible Points Against
Writers don’t want auditability. This is true; a lot of writers do not. If enough writers start using this, though, ideally the lack of such a badge would be considered strong evidence the writer does not take truth seriously, and it would therefore become in the interests of writers to include it.
People just won’t start using it. I think the most difficult part, here, is getting an initial quantity of writers to start using such a badge. A prediction market could help this by enlisting some famous people to start using it. But I freely admit early acceptance is the trickiest part. I’m not sure what the best approach is.
There’s a host of generic objections that also apply equally well to all prediction markets, which I will not here address.
Honestly, not sure if this would work or not. But I think there’s a possible world where it could help a lot.
As a top-50 metaculuser, I endorse all proposals that give me more status.
Top top-50 metaculus user is an ambigious term. Metuculus’s public metric that has little to do with how good someone is at predicting Metaculus questions but with how many predictions people make.
Metaculus decides against releasing something like top 100 metaculus users based on their Briers score that have at least 100 resolved predictions.
If Metaculus would actually decide to care about Briers score when scoring people you might actually lose your status. Writing this I’m curious about what your username on Metaculus happens to be as it might be worth to spend the tachyons to figure out how good you are at making metaculus predictions.
https://www.metaculus.com/accounts/profile/114222/
My current all-time brier is .1 vs .129 for metaculus prediction and .124 for community prediction on the same questions.
I’m also in the top 20 in points per question on https://metaculusextras.com/points_per_question
Both of those metrics heavily depend on question selection, so it’s difficult to compare people directly. But neither have to do with volume of questions.
Yes, those metrics seem to indicate that you would actually gain status with broad knowledge of your prediction performance ;)
That’s suprising to me. I thought the metaculus prediction is generally better then the community prediction?
I would love to live in this world.
This seems like a really hard problem: if a market like this “wins,” so that having a lot of points makes you high-status, people will try to game it, and if gaming it is easy, this will kill respect for the market.
Specific gaming strategies I can think of:
Sybil attacks: I create one “real” account and 100 sock puppets; my sock puppets make dumb bets against my real account; my real account gains points, and I discard my sock puppets. Defenses I’ve heard of against Sybil attacks: make it costly to participate (e.g. proof-of-work); make the cost of losing at least as great as the benefit of winning (e.g. make “points” equal money); or do Distributed Trust Stuff (e.g. Rangzen, TrustDavis).
Calibration-fluffing: if the market grades me on calibration, then I can make dumb predictions but still look perfectly calibrated by counterbalancing those with more dumb predictions (e.g. predict “We’ll have AGI by Tuesday, 90%”, then balance that out with nine “The sun will rise tomorrow, 90%” predictions). To protect against this… seems like you’d need some sort of way to distinguish “predictions that matter” from “calibration fluff.”
Buying status: pay people to make dumb bets against you. The Metaculus equivalent of buying Likes or Amazon reviews. On priors, if Amazon can’t squash this problem, it probably can’t be squashed.
Note that this could be mitigated by other people being able to profit off of obvious epistemic inefficiencies in the prediction markets: if your bots drive the community credence down super far, and if other people notice this, then other people might come in and correct part of the issue. This would reduce your advantage relative to other Metaculites.
This is not really a problem at Metaculus. Metaculus has metrics for player prediction, community prediction and Metaculus prediction on the questions someone answers.
The community prediction could be changed with sockpuppets but the Metaculus prediction can’t. You can judge people on how near they are to the Metaculus prediction in predictive accuracy or whether they even outperform it.
Metaculus however decides to keep that metric mostly private and makes it nonpublic. Metaculus has the problem of not wanting to embarrass users who make a lot of predictions when those predictions are on average bad
I don’t think Amazon makes a serious effort at battling review fraud just as Youtube doesn’t make a serious effort at comment quality when it easily could do something about it.
Amazon also has a harder problem because as it’s ground truth is less clear.
For scoring systems, rather than betting markets, none of these particular attacks work. This is trivially true for the first and third attack, since you don’t be against individuals. And for any proper scoring rule, calibration-fluffing is worse than predicting your true odds for the dumb predictions. (Aligning incentives is still very tricky, but the set of attacks are very different.)
There’s a close analogue, which is getting accepted as a superforecaster by the Good Judgement Project by performing in the top 1%, I beleive, on Good Judgement Open. (They have badges of some sort as well for superforecasters.) I’ll also note that the top-X metaculus score is a weird and not great metric to try to get people to maximize, because it rewards participation as well as accuracy—for example, you can get tons of points by just always guessing the metaculus average, and updating frequently—though you’ll never overtake the top people. And contra ike, as a rank 50-100 “metaculuser” who doesn’t have time to predict on everything and get my score higher, I think we should privilege that distinction over all the people who rank higher than me on metaculus. ;)
I will say that I think there’s already a reasonable amount of prestige in certain circles for being a superforecaster, especially in EA- and LW-adjacent areas, though it’s hard for me to disentangle how much prestige is from that versus other things I have been doing around the same time, like getting a PhD.
Yes, you should definitely milk your PhD for as much status as possible, Dr. Manheim.
Having been a rank 50-100 “metaculuser” myself before I completely agree (currently at rank 112).
Good to see so many of us moderately good forecasters are agreeing—now we just need to average then extremize the forecast of how good an idea this is. ;)
I don’t know what performance measure is used to select superforecasters, but updating frequently seems to usually improve your accuracy score on GJopen as well (see “Activity Loading” in
this thread on the EA forum. )
Yes, it’s super important to update frequently when the scores are computed as time-weighted. And for Mataculus, that’s a useful thing, since viewers want to know what the current best guess is, but it’s not the only way to do scoring. But saying frequent updating makes you better at forecasting isn’t actually a fact about how accurate the individual forecasts are—it’s a fact about how they are scored.
If this were true, we would expect to see declining media consumption—reduced viewership at Fox and CNN, for example. Instead the opposite is true, both reported record viewership this year. I take that to mean that the problem with journalism, insofar as there is one, is on the demand side rather than the supply side.
So, in general I think this claim is false. I would focus on finding a small subgroup for which it’s true, and dedicate your efforts to them.
I like this idea! It certainly seems worth trying.
I want one. Give me the internet points.
Twitch has recently begun experimenting with predictions for streamers using their channel-points currency.
it would have to be a third party service, perhaps appended to peoples names via a chrome extension at first?
it would also have to be auditable publically so people could see each individual prediction that lead to the overall score. similar to an amazon rating or rotten tomatoes score.
as they’re rated you can also begin to introduce other ratings and classifiers beyond just accurate or inaccurate.
The credibility link would not be associated with trust in the news paper but trust in the judges of the prediction market. It might be that having a single authority whose one job is to make judmements on what are “objective results” is more efficient than current arrangements. But is it not clear that you could convince randoms that you are fair such checker simply by using a scoring system.
It seems hard for me that somebody that attains a low score to continue to believe that the low score giver is a good authority on others.
Stuff like Augur or Reality for decentralized oracles is cool for solving this