Quadratic voting for the 2018 Review
LessWrong is currently reviewing the posts from 2018, and I’m trying to figure out how voting should happen. The new hotness that all your friends are talking about is quadratic voting, and after thinking about it for a few hours, it seems like a pretty good solution to me.
I’m writing this post primarily for people who know more about this stuff to show me where the plan will fail terribly for LW, to suggest UI improvements, or to suggest an alternative plan. If nothing serious is raised that changes my mind in the next 7 days, we’ll build a straightforward UI and do it.
I’ve not read anything about it, so briefly, what is quadratic voting?
I’m just picking it up, so I’ll write a short explanation it as I understand it, and will update this if it’s confused/mistaken.
As I understand it, the key insight behind quadratic voting is that everyone can cast multiple votes, but that the marginal cost of votes is increasing, rather than staying constant.
With other voting mechanisms where cost stays constant, people are always incentivised to vote repeatedly for their favourite option. This is similar to how, under most scoring rules, people bet all their chips on the outcome they think is most likely, rather than spread their money directly proportional to their actual probability mass. You have to think carefully to design a proper scoring rule to incentivise users to write down their full epistemic state.
With quadratic voting, instead of spending all your votes on your favourite option, the more you spend on an option the more it costs to spend more on that option, and other options start looking more worthwhile. Your first vote costs 1, your second vote costs 2, your nth votes costs n. And you have a limited amount of cost you can pay, so you start having to make new comparisons about where the marginal cost should be spent.
Concretely, there are two things for a person voting in this system to keep a track of:
Votes is the total votes you make.
Cost is the total cost you pay for your votes
If you votes are
Post A: 5 votes
Post B: 8 votes
Post C: 1 vote
Then your numbers are:
Votes is 5 + 8 + 1 = 14
Cost is
This is a system that not only invites voters to give a rank ordering, but to give your price for marginal votes on different posts. This information makes it much more easy to combine everyone’s preferences (though how much you care about everyone’s utilities is a free variable in the system. Democracies weight everyone equally, and we can consider alternatives like karma-weighting on LW).
It’s called quadratic voting because the sum of all numbers up to n is like half of . I think (but don’t know) that it doesn’t really matter how much each vote costs, as long as it’s increasing, because then it causes you to calculate your price for marginal votes on different options. “Price Voting” might have been a good name for it.
Other explanations I’ve seen frame it that the marginal vote gets counted as less, rather than it costing more, but I’m pretty sure this has an identical effect.
Vitalik Buterin has written more about it here.
How would this work in the LessWrong 2018 Review?
So, we’re trying to figure out what were the best posts from 2018, in an effort to build common knowledge of what the progress we made was, and also reward thinkers for having good ideas. We’ll turn the output into a sequence and, with a bit of editing, I’ll also make it into a physical book.
(Note: Content will only be published with author’s explicit consent, obviously.)
To do this vote, I’d suggest the following setup:
All users over a certain karma threshold (probably 1000 karma, which is ~500 users) are given an input page where they can vote on all posts that were nominated that year.
Their total Cost users can spend is set to 500.
We will keep this open for users for 2 weeks, during which time they can cast their votes. You can save your votes and also come back to edit them any time in the two weeks.
While the votes are being cast, there will be a ‘snapshot’ published 1 week in, showing what the final result would be if the vote were taken that day. This will help users understand what output the votes are connected to, and help to figure out what posts you want to write reviews for (i.e. writing things to help other users understand why a post is being undervalued/overvalued according to you).
At the end, I’ll publish a few versions of aggregating the votes, such as karma-weighted, not-karma-weighted, and maybe some other ways of combining rank orderings. We’ll do something like select the top N posts where the word-count sums to ~100k words (aka a 350 page book) to be in the final sequence. I expect this will be around 25-30 posts.
Some objections
What if I have reason to think a post is terrible?
Negative voting is also allowed, where a negative votes costs the same as a positive vote i.e. 4 votes costs the same as −4 votes. I think this probably deals with such cases fine.
What about AI alignment writing, or other writing that I feel I cannot evaluate?
So let me first deal with the most obvious case here, which is AI alignment writing. I think these posts are important research from the perspective of the long-term future of civilization, but also they aren’t of direct interest to most users of LessWrong, and more importantly many users can’t personally tell which posts are good or bad.
For one, I think that alignment ideas are very important and I plan to personally work on it in a focused way, and so I’m not too worried about it not all getting recognised in this review process. I’d ideally like to make books of the Embedded Agency, Iterated Amplification, and Value Learning sequences, for example, so I’m not too bothered if the best ideas from each of those aren’t in the LessWrong 2018 book.
For two, I think that I think there is a deep connection between AI alignment and rationality, and that much key stuff for thinking about both (bayes, information theory, embedded agency, etc) has been very useful for me in thinking about both AI alignment and personal decision making. I think that some of the best alignment content consists of deep insights that many rationalists will find useful, so I do think some of it will pass this bar.
For three, I still think I trust users to have a good sense of what content is valuable. I think I have a sense of what alignment content is useful in part because I trust other users in a bunch of ways (Paul Christiano, Wei Dai, Abram Demski, etc). There’s no rule saying you can’t update on the ideas and judgement of people you trust.
Overall I trust users and their judgments quite a bit, and if users feel they can’t tell if a post is good, then I think I want to trust that judgment.
Is this too much cognitive overhead for LW users?
I am open to ideas for more efficient voting systems.
A key thing about spreading it over two weeks, means that users don’t have to do it in a single sitting. Personally, I feel like I can cast my quadratic votes on 75 posts in about 20 mins, and then will want to come back to it a few days later to see if it still feels right. However Ray found it took him more like 2 hours of heavy work, and feels like the system will be too complicated for most people to get a handle on.
I think the minimum effort is fairly low, so I expect most users to have a fine time, but I’m happy to receive public and private feedback about this. In general I likely want to make a brief survey for afterwards, about how the whole process went.
- The LessWrong 2022 Review by 5 Dec 2023 4:00 UTC; 115 points) (
- The 2020 Review by 2 Dec 2021 0:39 UTC; 112 points) (
- The LessWrong 2021 Review: Intellectual Circle Expansion by 1 Dec 2022 21:17 UTC; 95 points) (
- Voting Phase for 2019 Review by 13 Jan 2021 1:33 UTC; 53 points) (
- Voting Phase of 2018 LW Review by 8 Jan 2020 3:35 UTC; 51 points) (
- 2020 Review: Final Voting by 28 Jan 2022 3:39 UTC; 34 points) (
- Vote in the LessWrong review! (LW 2022 Review voting phase) by 17 Jan 2024 7:22 UTC; 26 points) (
- 24 Jan 2020 22:02 UTC; 17 points) 's comment on 2018 Review: Voting Results! by (
- 31 Dec 2019 0:34 UTC; 10 points) 's comment on The Review Phase by (
- 5 Sep 2021 15:38 UTC; 9 points) 's comment on The Coordination Frontier: Sequence Intro by (
- 2 Dec 2021 22:38 UTC; 8 points) 's comment on Effective Altruism: The First Decade (Forum Review) by (EA Forum;
- 1 Apr 2022 17:48 UTC; 6 points) 's comment on Replacing Karma with Good Heart Tokens (Worth $1!) by (
I am generally very skeptical about quadratic voting. In my opinion:
1. The nice calculus identities that it satisfied are optimal under an assumption of individual strategic voting
2. There are some kinds of elections where people are mostly motivated to be strategic, and others where they’re mostly motivated to be honest. Basically, strategy tends to happen when there are factions, “us vs. them”; otherwise, people don’t bother.
3. But when there are factions, that means there will tend to be group-level strategic voting. And QV is no more robust, and possibly more vulnerable, to group-level strategy than other voting methods.
4. There are ways to patch QV to “fix” (3); but, as with many voting method patches, you create 2 new problems for every 1 you fix.
At a meta level: overall, my level of confidence in each of those points above is not particularly high. Say, on the order of 70% confident in each, and they’re roughly independent. So that would mean that a chain of logic that relied on all four being true would only be roughly 25% reliable. But I suspect that for QV to be a bad idea, it’s not necessary that all 4 of them are perfectly true; “most of them are mostly true” would be enough. So, say, 50% confidence that it’s a bad idea. If your prior was 75% that it’s a good idea, and if you trust me completely, you’d now think that it’s 37.5% a good idea. (Which might be good enough to be worth a try, given that failure in this case wouldn’t be so very bad.)
Also meta: Honestly, I don’t think it’s too arrogant to say that I doubt there’s more than a handful of people in the world more qualified to opine on this than I am.
Gotta go now, but I’ll respond on this more later, with an actual suggestion.
Robin Hanson makes a similar point here.
However, I’m not sure what sorts of collusion you’re worried about for this round (but haven’t though much about it)?
My understanding is that collusion in QV looks like:
1. People hijacking what bills get put up for vote in order to bankrupt people who want to veto the bill
2. People splitting their funding contributions across multiple fake identities in order to extract more subsidies
3. People coordinating their votes with others (because rather than me buying x votes it’s cheaper that I only buy x-y and “pay for that” by spending money on someone else’s preferences)
1 and 2 won’t be a problem for the review since you have a set number of voters with known identities, as well as a set number of posts to vote on. So I presume you’re worried about vote trading as in 3?
There’s a variant on quadratic funding called pairwise quadratic funding that aims to make naive collusion much less useful: https://ethresear.ch/t/pairwise-coordination-subsidies-a-new-quadratic-funding-design/5553
AFAICT it hasn’t been adapted yet to quadratic voting, but I’d love to see LW be the first ones to do so.
One of the things I really love about pairwise bounded QV is that it actually disincentivizes even unconscious collusion. In a democractic republic like the US with a traditional voting scheme, I’m incentivized to find issues that agree with others so that I have more voting power.
In a pairwise bounded QV voting scheme, I’m actually incentivized to find issues that I care about that are neglected by others, and vote on those (as my votes will literally be worth more).
Of course, one of the biggest issues with pairwise bounded QV is that it makes it much harder to figure out how much an individual vote will be worth on any given issue, as it depends on how correlated my votes are with others who vote on the same issues.
Still interested in your suggestions. :)
I’m (personally) less convinced than Oli, Jim or Ben that quadratic (or any system with extra complexity to address strategic voting) is the way to go here. I don’t really think this is a situation where people are pressured to strategic vote, at least not very strongly.
Honestly, it’s harder to come up with a better suggestion than I would have thought. This is nearly the ideal use case for quadratic voting:
Reasonably engaged, and very nerdy, voting population
Relatively high mutual respect and common values; relatively low factionalism
The meta-goal is actually as much increasing overall engagement and building community, as it is choosing the optimal winner set.
QV is a shiny new thing, and the math behind it is cool.
The very definition of “ideal winner set” isn’t well-specified. Do you want proportionality? That is, if there is a 30% faction that loves things other people hate, should they decide 30% of the winner set, or should the algorithm try to find good compromise options that everyone can live with but nobody loves, or something else?
Overall, without hearing more about what your real goals are with this, I guess my best suggestions would be:
Include options to vote “score voting style” (bounded ratings) or “quadratic style” (ratings with bounded euclidean norm). I’d suggest scaling the SV votes so that their average euclidean norm is the same as that of the QV votes. (The strategy in this case is relatively obvious, but the strategic leverage isn’t too high, and the stakes are relatively low, so I wouldn’t worry too much.)
For the QV ballots, draw visualizations: spirals made up of successive right triangles, so that the first rating is an adjacent side, each further rating is an opposite side, and the root-sum-squares is the final hypotenuse.
If you did want a proportional method, I’d probably suggest something like E Pluribus Hugo with quadratically-scaled ballots behind the continuous part. That is actually not too too complicated (voters who didn’t want to get too complicated would be free to vote approval-style), and proportional, and quite robust to strategy.
This all makes a lot of sense, I’m glad to hear you say it. I think that the option for ‘score voting style’ is quite good, we in fact were seriously considering doing something like that.
I really like the idea of producing a visualisation as the user makes their votes up. That sounds delightful.
Yeah. As I understand is, this just means that you sum the squares of the SV and QV votes, then linearly scale all the votes of one such that these two numbers are equal to one another. And then you’ve got them on the same playing field. And this is a trivial bit of computation, so we can make it that if you’re voting in SV but then want to move to QV to change the weights a little, when you change we can automatically show you what the score looks like in QV (er, rounded, there’ll be tons of fractions by default).
Instant Runoff seems to be optimising for outcomes about which the majority have consensus, which isn’t something I care as much about in this situation. That said I don’t fully understand how it would change the results.
… such that the average for each of these numbers are equal, yes. I think that the way you said it, you’d be upscaling whichever group had fewer voters, but I’m pretty sure you didn’t mean that.
E Pluribus Hugo, and more generally, proportional representation, have nothing to do with Instant Runoff, so I’m not sure what you’re saying here.
The second paragraph in the linked post says:
The Hugos use EPH for nominating finalists, then IRV to choose winners from among those finalists. Those are entirely separate steps. I was talking about the former, which has no IRV involved. I apologize for being unclear.
This is similar to what I was personally imagining, and what I think I’d personally want.
When I went through the 75 posts myself, imagining voting for them, what I found was that I basically wanted to put each post into one of a few buckets, something like:
“no” – not a contender for book
“decent” – a pretty neat idea, or a ‘quite good’ idea that wasn’t well argued for
“quite good” – some combination of “the idea is quite important; or, the conversation moved forward significantly; or, a neat idea was extraordinarily well argued for with excellent epistemics”
“crucial” – this is a foundational piece that I hope one day becomes ‘canon’
(I could imagine wanting to downvote posts, but in this case there weren’t any I wanted to rank lower than ‘no’)
One additional thing I kinda wanted out of this the ability to flag (and aggregate data) about which posts had better or worse epistemic virtue. At first I thought of having two different voting scales, one for “value” and the other for “is this literally true, and/or did the author demonstrate thoughtfulness in how they considered the idea?”
I was worried about the obvious failure mode, where e.g OkCupid creates a “personality” and “attractiveness” scale, but it turns out the halo effect swamps any additional information you might have gleaned, and the two scales mapped perfectly.
When I attempted to rate each post myself, what I found was I almost always ranked epistemics and importance the same (or at least it wasn’t obvious that they were more than “1 point” away from each other on a 1-10 scale), but that were a few specific posts I wanted to flag as “punching above or below their weight epistemically.”
I’m not quite sure if this is worth any additional complexity. A simple option is to leave a “comments” box for each post where people can explain their vote in plain english. I’m a little sad that doesn’t give us the ability to aggregate information though. (A simple boolean, er, three-option radio radio button, with optional ‘punches above its weight epistemically’ or ‘punches below its weight epistemically’ might work)
We were in fact hoping you’d show up with opinions. :)
Throughout the OP my main question was what does Jameson think about this. It felt a bit odd to me that a specific voting method was being advocated without at least some of his input.
(To be clear, I did PM him saying I’d love for him to comment on the post, if he had the time.)
Slightly nitpicky:
The update from the prior isn’t quite right here. I would have to consider what probability I would have assigned to you having the opinion outlined in your comment if the idea was bad vs if the idea was good.
As you’re only saying 50% confidence it’s hard to distinguish good from bad so an update would probably be of a lesser magnitude and would naively not update in either direction. My actual update would be away from the extremes—it probably isn’t amazing but it probably isn’t terrible.
Here’s an intuition for why it’s important that it’s quadratic (based on standard microeconomic reasoning).
By spending your votes, you pay some cost, and get some benefit.
The cost consists in: voting credits, time, attention, energy, reputation costs if you have odd views...
The benefit consists in: higher probability your post-of-choice ends up in the book, which comes with a host of externalities like further influence of your values and epistemics on readers of the book.
As long as cost < benefit, you want to keep voting (otherwise you’d be leaving benefits on the table). You’ll do this until the cost of the last vote = benefit of the last vote.
If your total cost of voting is f(V)=V22 then the cost of each marginal vote is f′(V)=2V2=V. By the above reasoning, you’ll stop voting when your marginal benefit = marginal cost = V.
Hence, your distribution of votes V1,V2,... across options o1,o2,...will measure how much you value each option being included in the book.
(After I’d written this I found this blog post by Vitalik which explains it even better.)
An issue with the proposal is the failure of the assumption of utility of votes being proportional to number of votes.
It seems plausible to me that there’s some threshold of votes above which a post will very likely end up in the book. If this is true, then I think my utility in buying more votes for that post would be ~linear up until that threshold, and then flat (ignoring potential down-voters).
If this is the case, I’d significantly understate my preference for this post, instead spending my points elsewhere.
So, for example, Embedded Agency might be undervalued by this scheme.
This point isn’t relevant when thinking about quadratic voting for elections with millions of voters, since then it makes more sense to assume that the probability of passing a proposal is linear in the number of votes I can influence.
Moreover, this would lead to weird equilibrium dynamics…
If I’ve voted for a post that gets above the threshold, then I want to remove my votes and place them elsewhere. If I don’t do this, but other users do, then I am effectively subsidising their preferences.
I don’t know how this would pan out, and can see it messing things up as everyone tries to model everyone else and be clever.
It seems like an open numerical question whether this issue would be relevant for the current round (i.e. whether the utility of most users would be linear in the region of influence they could expect with their votes):
Here are some numbers that I wrote down but now don’t really know how to take this further. Under Ben’s initial scheme, each user can buy at most ~32 votes per post, by spending all their money. There are ~500 voting users. Which gives an upper bound of ~16000 votes for a post. There are ~75 nominated posts, out of which ~25 will end up in the book. If all users distribute their votes uniformly, we’d have about ~3.65 votes per user per post, and ~1800 votes per post. Let’s handwave and say that with 7000+ votes a post is as good as guaranteed for the book.
I think a solution to this might be that if instead of voting on what should be in the book, you decide on some subsidy pile of karma+money, and you use quadratic funding to decide how to allocate that pile to each post (and then giving it to the author/eventual co-authors).
You might just include the top posts in this scheme in the book, also making sure to make their scores prominent (and perhaps using scores in other ways to allocate attention inside the book, e.g. how many comments you include).
It seems more plausible to me that under this scheme users utility would be linear in the amount of votes/amount of funding they allocate to posts.
This makes a lot of sense, thanks for writing it down.
Aside: I think it’s way harder for readers to read stuff bulletted like this, and would personally find it easier if this was identical but without the bulletting.
I came here to say ‘nah bro bullets are great’ and then I looked at the parent and the was like ‘oh geez thats too many bullets’
One interesting way to get around this problem would be to have votes be private.
This would probably create other weird dynamics.
Don’t think that would help—instead of knowing the actual votes Vi for post i, I would have some distribution P(Vi) over the votes cast for i, and my intuition is that as long as I have sufficient probability mass above the threshold, it would skew my incentives.
But doesn’t the knowledge that everyone else is also doing this converge to just stating my true preferences… Or something? I don’t really have a feel for the game theory behind this, but it feels like knowing that everyone is trying to vote strategically makes it hard for me to count on other people voting for things that I don’t vote on.
I don’t expect everyone to vote strategically. In fact, I expect most users to act in good-faith and do their best. I still think these things can be a problem.
That’s interesting, because I expect most people to vote strategically when using QV. The structure of QV heavily encourages thinking about the value of each marginal vote.
Trying to allocate your budget truthfully in accordance with your preferences about posts != trying to game the rules as an unbounded EV-maximiser would.
Strategic voting for me = trying to think how much value your vote has relative to the outcome you’re trying to achieve. I don’t see for instance looking at how many people have already voted on something as “gaming the rules”, it just changes the value of a marginal vote of my own. I expect most people to think like that because QV is already making you think about the marginal value of another vote.
I agree that you will stop voting on a post when the marginal cost = marginal benefit. This means if I take my votes from above:
Post A: 5 votes
Post B: 8 votes
Post C: 1 vote
This means I valued a vote on post A at price 5, post B at price 8, and post C at price 1, where the prices are all relative to each other.
So my noob question is: Don’t I know this relative pricing having to say what the marginal cost function is? If each marginal vote had costed 2n, wouldn’t I still have stopped at the same relative prices? Like, if the above votes were done with the marginal vote costing n, here’s what it would look like if the marginal vote cost 2n.
Post A: 2 votes
Post B: 4 votes
Post C: 0 votes
Which are all basically the same ratios (with a little bit less resolution), right?
If you scale it by a constant k that will happen (as the constant will just stick around in the derivative, and so you’ll buy votes until marginal cost = marginal benefit / k).
If you were to use like f(V)=V33 then each marginal vote would cost f′(V)=V2 , and so you’d buy a number of votes V such that V=√B (where B is your marginal benefit).
Some of the QV papers have uniqueness proofs that quadratic voting is the only voting scheme that satisfies some of their desiderata for optimality. I haven’t read it and don’t know exactly what it shows.
Yeah, I guess I’m hearing that all versions of it still cause the voter to work out prices, and that the information is findable by transforming their votes, but that quadratic doesn’t require doing any transformation and makes things simple.
I think that it’s not just about having an easier time reverse engineering people’s values from their votes. It might be deeper. Different rules might cause different equilibria/different proposals to win, etc. However I’m not sure and should probably just read the paper to find out the details.
For the specific goals that quadratic voting was designed for (figure out how much to fund public goods; assume each voter is a self-interested rational actor; assume no collusion) it is important that cost grows quadratically (and not, say, cubically). (Of course, the LW review is different enough that this wouldn’t apply.)
Nitpick: quadratic voting and quadratic funding are technically different schemes. In the former you vote for specific bills that can either pass or not pass. In the latter you fund projects and your donations are matched according to a particular formula.
However, there is a close correspondence between them. One way to see it is as follows. Quadratic funding can be seen as a vote using the quadratic voting scheme on the following bill:
and the quadratic funding subsidy formula is the maximum X for which Voter 0 will not pay to stop the bill.
In more detail: to prevent the bill, Voter 0 must buy more votes than all the other voters who voted against them. That is, V0>∑ni=1Vi. If we use the cost-function C(V)=V2 , each of the other voters paid √V1 for their votes. This means that in order to prevent the bill from passing Voter 1 must pay C(V0)>(∑ni=1Vi)2=(∑ni=1√Ci)2. But this is exactly the subsidy formula from quadratic funding.
Yes please! More experimenting with governance.
Seems better than the current system which as far as I can tell is just 10
if
statements that someone chose without much reason to think it makes sense.Not sure I parse this sentence. Could you explain in different words?
I think Tenoke thinks that we are talking about the usual post and comment vote system.
As a random side note: I don’t think the system is best described as 10 if statements, but as a log of base ~3 that was heavily rounded to make it easier to communicate.
Isn’t that what you were going to use initially or at least the most relevant system here to compare to?
No, we never really planned using the existing voting system for deciding on the 2018 review sequence and book. I agree it’s a reasonable thing to compare the system to, and we considered it for a bit, but I don’t think it’s very well-suited to the task (mostly because it doesn’t allow individual users to express enough variance in their assessments).
I assumed Tenoke was referring to the stated plan in the initial review post:
I think that’s unlikely, since he linked to Github to the lines of code that calculate the strong-upvoting strength of a user.
“Current system < OP’s system”