Multi-winner Voting: a question of Alignment

This is my third (and for now, last) essay about voting theory for rationalists. In the first two, I focused primarily on single-winner voting theory; that is, methods for aggregating group preferences into a final verdict on some choice. Ideally, single-winner methods would be used in cases where decisions are inherently collective, while other mechanisms such as markets are better for cases where decisions are more individual. (As I touched on in the earlier articles, Sen’s theorem puts limits on how precisely that distinction can be made; but that’s not the point here. I’m going to take it as given that there are some cases where collective action is called for and others where action should be left up to individuals, and I don’t want to spend time here arguing about the relative frequency or importance of those two kinds of situations.)

Why isn’t multi-winner voting theory just a generalization of the single-winner kind?

If we’ve covered the best means for collective decisions, and individual decisions are out of the scope of voting theory, then what’s left? Governance. That is, cases of collective action that aren’t a single decision point but an ongoing series of decisions.

Such cases probably aren’t best served by a series of separate single-winner elections, for a few of reasons. To begin with, it’s not cognitively efficient; it would be silly for every citizen to need the expertise in order to make decisions about the minutia of every policy area. In fact, direct democracy tends to favor negative-sum rent-seeking: small groups extracting concentrated benefits by imposing diffuse costs, merely because they’re the only ones motivated enough to sweat the details. And finally, it’s not predictable: in many cases, governance should be coherent even at the sacrifice of some responsiveness.

In cases of governance, voting is not the final step, but merely one step in a larger process of decision-making. Thus, traditional multi-winner voting theory would look at ways to resolve this by electing a set of representatives to take those decisions.

To the rationalist community, such a multi-step process immediately raises the question of alignment. Just as designers of artificial intelligence should worry about whether their initial goals will be warped into a contrary outcome through the process of design and improvement, so should people like me designing multi-step mechanisms of governance worry about how values are preserved, lest small misalignments in each step add up to major disconnects in outcome.

Proportionality

Of course, if you’re worried about preserving some property over a multi-step process, the first thing to do is to define that property. In this case, the key property is the proportions of decision-makers with each given set of utilities. Proportional multi-winner voting methods are those that are designed to (roughly) preserve these proportions. Thus, collective decisions can be made by smaller groups; the ugly dynamics of mass argument can replaced by the hopefully-healthier ones of a smaller group. (Though flawed, the concept of “Dunbar number” is relevant here.)

Note that voting theory itself has nothing to say about how to define the original group whose proportions should be preserved. That is, it doesn’t answer questions of who should be able to vote or how many votes each voter should have, in defining the original proportion. I’d argue that the safest and ultimately best rule today is for each human above a certain age to be allowed to vote; but that’s out of the main scope of this article.

In stating the goal of “proportionality”, I’ve been deliberately a bit vague about defining it. If voters come pre-sorted into comprehensive and mutually-exclusive partisan sets, it’s relatively easy to define “Droop proportionality”, in which each party gets a minimum proportion of seats in the legislature. But what if divisions of opinion are more complicated than that — continuous and/or multidimensional? In that case, there are various desirable proportionality properties, and some degree of tradeoff between them.

As a statistician, I should mention that there is one democratic “voting” method which will satisfy every possible proportionality property, at least “asymptotically” as the legislature grows in size. I’m talking about random sampling, or, as it’s called when used for governance, sortition. By the law of large numbers, a random sample will tend to resemble the underlying population in proportions as to any and all individual characteristics, at least if the sample is large enough. In practice, sortition is rarely used for governance, though advocates of “citizens’ assemblies”, “citizen juries”, “deliberative polls”, and the like are trying to change that.

If we require voting methods to be deterministic, there are still a number of methods that have been designed to ensure proportionality; all such methods are called “proportional representation”. (Since that’s a mouthful and PR has too many meanings already, the best abbreviation is prop-rep.) In general, though perfect proportionality is impossible, most prop-rep methods come close enough that their other, more-pragmatic differences are more important.

Values and beliefs

Of course, representation (proportional or otherwise) is a goal regarding values, but decisions are also based on beliefs; and when it comes to beliefs, the goal should be truth, not representation. The idea of futarchy is about creating a political system that separates values and beliefs, so that values are resolved using a voting method (presumably one where any sub-steps preserve proportionality), while beliefs are resolved using prediction markets. While I’m skeptical of the possibility of designing markets that are immune to bubbles, values-based manipulation, or other systematic distortions, I think that the idea of trying to design a system that respects the separate logics of both values and beliefs is a good one.

Note that the current US voting system actually does try to do this to some extent, it just does a really crappy job. If political parties were groups of people with perfectly homogeneous values, then party primaries would not be the worst way of selecting smart, knowledgeable people with those values and thus of getting a slightly extrapolated volition as compared to mere sortition. Of course, we know that in many real-world cases, primaries are more about ideological litmus tests than qualifications like expertise or intelligence.

Still, that suggests that proportional voting methods should probably include mechanisms for both intra-party and inter-party selection of candidates. In particular, closed-list proportional methods, which offload intra-party selection to some partisan mechanism probably dominated by insiders, are a bad idea.

(A related dichotomy is that between instrumental rationality, which involves both values and beliefs, and epistemic rationality, which involves only beliefs. So this issue can be seen as about finding ways to decrease the misalignment between the incentives for an instrumental and an epistemic rationalist.)

Parties

Another important question about voting methods is the party system they encourage.

First question: should there be parties at all? Though some people would disagree, I’d suggest that parties play an inevitable, and in some regards a positive, role in a political process. Yes, they do have bad effects, such as mind-killing tribal thinking; but they also have good ones, such as serving as useful cognitive heuristics for voters, and possibly allowing intraparty sorting to have more of a focus on qualifications and ability rather than ideology. Furthermore, even if you do believe they are bad on net, getting rid of them is really hard. Metaphorically speaking, if you try to design a voting system that bars the door against parties, you may find that they just make a hole in a load-bearing wall as they force their way in anyway.

Second question: how many parties should there be? Too few, and you get a stagnant “monopolistic” or “duopolistic” system in which zero-sum thinking leads to negative-sum outcomes. (For a real-world example, look at the USA.) Too many, and you encourage politicians who make narrow, single-issue appeals. (For a real-world example, look at Israel.)

Political scientists often view the distinction of few or many parties as by considering the representative voting method as just one step in a larger process of forming a majority coalition to take a societal decision. In other words, they speak of systems which encourage few parties as encouraging pre-election coalition-building, and those that encourage many parties as encouraging post-election coalition-building. In my view, it’s good to have a little of both.

A useful way to measure number of parties is “effective number of parties” (ENP). The formula is

, where s_i is the size of party i as a fraction of all voters. Intuitively, this is the reciprocal of the fraction of voters in the party of the average voter (thus naturally weighting larger parties more). In other words, if the average voter’s party size is ¹⁄₃ of the electorate, then ENP is 3. I’d aim for something between 3 and 4 as ideal.

I’d argue that choosing a voting method that tends towards such a moderate ENP will also tend to encourage better rationality within the legislature. As I said above, in a two-party system, with only one ideological dimension, winning or losing the eternal battle against the other side is all that matters, and so norms of debate (including rationality norms) go out the window. And a highly fragmented world of single-issue parties actually has exactly the same problem; since each party is focused on just one issue, they have no reason to subscribe to overarching norms. It’s only when there are more than two parties which each care about more than one issue that norms become selfishly worthwhile to each; though the norms might work against them on any one issue, insofar as they’re positive-sum norms they will tend to work for each party’s interests more than they work against them.

Voter strategy: free riding and vote management

In essentially all proportional methods (except weighted/proxy systems), an individual voter has an incentive not to vote for a candidate whom they know will win anyway, in order to avoid having any of their voting power “used up” by that foregone conclusion. But even though this incentive exists to some extent across many methods, its strength varies. All else equal, it’s better to look for methods where this incentive is relatively weak.

On a collective level, this incentive is somewhat self-limiting. That is, if nobody votes for a popular candidate just because they’re a “sure thing”, then that candidate won’t win after all. So collectively this incentive isn’t so much for “free riding” as for “vote management”: giving each candidate exactly the minimum number of votes they need to win. For instance, a party might try to equalize the number of votes that favor each of their candidates by instructing voters to vote based on their birthday.

Pragmatics (1)

So from the above, we’re looking for a voting method that’s reasonably proportional; that allows voter input on both within-party and between-party choices; that encourages a moderate number of parties; and that has a relatively weak free-riding incentive. That is an underdetermined set of constraints; there are a number of methods which do all of those to (what I’d consider) a pretty good degree. To choose between those proposals, we can add in pragmatic questions. Which methods are easiest for voters? Which are easiest to count? Which are likely to be most politically viable (which includes being non-disruptive to incumbents, at least, when disruption doesn’t serve a useful purpose for any of the values above)? Which have the best track record?

Proportional Method Lego

Most proportional methods can be thought of as combinations of a few basic building blocks:

Greedy assignment and deweighting. Choose winners one at a time according to who has the “most votes”, then reweight the ballots that helped them get elected so that some of their voting power is used up. There are various reweighting schemes that work. Say there are 40% of the ballots that all are among the strongest ballots helping elect the same 3 winners out of a total of 9 seats. They can be reweighted to 20, 10, 5; to 13.3, 8, 4.28; to 30, 20, 10; or to 28.89, 17.78, 6.67. All of these schemes, if applied to all groups, will end up with a proportional result; they differ in whether they round leftovers towards larger or smaller parties and in the strength of their free riding incentives. Note that greedy algorithms are actually approximations of more-complex globally-maximizing algorithms. Mostly voting methods do not use global maximizers, simply because they’re harder to explain.
Elimination and transfer. Eliminate “losers” and transfer their votes based on some implicit or explicit preference order. Note that when combined with the above, this sequential elimination is an extra, unnecessary greedy approximation. In the single-winner case, it’s what leads to the center squeeze problem.
Descending threshold. Instead of elimination and transfer, you can progressively lower some threshold, and count ballots as supporting all candidates they rate above that threshold. Even though one ballot may count as supporting multiple candidates, it will still be deweighted if any of those candidates actually wins, so it does not get any additional voting power. This is theoretically-superior to elimination and transfer, but the difference is usually small in practice, and this has far less of a track record of real-world use.
Districts (single- or multi-member): Simplify matters by dividing up into sub-elections. These may be entirely separate, or unified by mixed-member or biproportional mechanisms (below). Traditionally, the variable name used to denote district magnitude is “M”.
Mixed member. Some seats are assigned by a fully nonproportional system (such as FPTP by districts), while others are later assigned by a proportional system so as to adjust the proportions. This is often accompanied by a dual ballot; for instance in Bavaria, you may vote for one candidate in your own district and one candidate outside your district but in your region.
Biproportionality. Results are constrained so that there is exactly a certain number of each kind. (This is akin to stratified sampling in survey design.) For instance, there could be a rule that there should be exactly 1 winner per equal-population district, or that there must be at least X% of winners of each gender, or that certain seats are reserved for a native ethnicity.
Ranked ballots. Voters rank candidates in preference order.
Delegation. Each candidate makes a (partial?) ranking or rating of the other candidates, and this is (optionally?) used to fill in preferences on ballots cast for that candidate. Most proposals have candidates pre-register preferences, to avoid corruption and so that voters can use this information when casting their ballots, but in theory it would be possible to allow candidate preferences to be set after the election.
Pooling. Similar ballots (for instance, those that prefer a given candidate, or those that prefer a given party) are averaged and then counted together. This sacrifices some information about the details of each ballot, in order to make counting summable from the precinct level. Note that without delegation and/or pooling, proportional methods are not summable, which can present practical problems in vote-counting such as chain-of-custody.
Open party lists. Essentially, this means that there are separate mechanisms for assigning each party an appropriate number of seats, and for choosing which of that party’s candidates get those seats. This can allow for simpler ballots; for instance, a voter can choose a single candidate and that can be counted both as a party vote in a proportional system and as a vote for that candidate in a nonproportional within-party ordering. (Note that open party lists can be seen as just a special case of pooling, but since they’re a common idea, I’m listing them separately.)
Party thresholds. That is, parties with under a given percentage (such as 5%) are not given any seats. This is a mechanism to stop “fringe parties” from winning proportional seats; in other words, to keep the ENP from growing too large. But it’s very much a blunt instrument, especially if votes for sub-threshold can’t transfer to other similar parties. In real-world elections, party thresholds and “divide and conquer” have let parties with as little as 38% of the popular vote get legislative majorities in supposedly “proportional” methods, with serious long-term consequences.
Individual local thresholds. Individual candidates with under a given percentage (such as 25%) of votes from their local district are eliminated. Since this usually is used in combination with vote transfers, it’s much less of a blunt instrument than party thresholds. For instance, a party with just 15% of the vote region-wide will probably have some candidates with over 25% of the vote in their district; these candidates will get transfers from their co-party-members and thus probably win seats. And even if the party gets no seats, their votes will be transferred to a similar party, not just be wasted.

Combining the above building blocks, we can build various voting methods:

Regional open list: Open list (pooling by party). Districts, typically with 10-40 seats each. For the proportional backbone, because of pooling, there are many which give the same outcome, but can be seen as a greedy/deweighting method.
STV: (Single Transferrable Vote) Districts typically M=5 or so. Ranked ballots, deweighting, and elimination. Used in Ireland and Malta, and at some levels in Australia.
MMP: Mixed member: FPTP + open list. (Good example: Bavaria. Bad example: Wales.)
DMP: (Dual Member Proportional) Mixed member: FPTP + biproportional open list, so that there are exactly 2 winners per district.
LPR: (Local Proportional Representation) Biproportional + STV.
PLACE: (Proportional, Locally-Accountable Candidate Endorsement) Preferences are set by a hybrid of delegation and individual pooling. There’s an individual local threshold of 25%. Seats are biproportional, so that there’s exactly one winner per district. The back-end method is STV.

Pragmatics (2): I think PLACE is awesome

I’m going to switch from just explaining multi-winner voting theory to advocating for a specific method, so I should start out by explaining where I’m coming from. I’m a US activist for voting reform; on the board of the Center for Election Science (electology.org). My object-level politics, and my social milieu, tend to be pretty much on the left of the spectrum, but I also have real meta-level politics in favor of democracy. Ask me about any given issue and I’ll happily explain why my own views are smarter than those of the median voter; but across all of those issues I know that the crowd is probably wiser than I am as often as not.

I’ve been thinking seriously about voting theory for over 20 years, and it’s the main reason I am now getting a doctorate in statistics. In that time, I’ve designed many voting methods. The ones I consider best (3-2-1, PLACE, EPH, and SODA) are designed to optimize on the characteristics I think are important. When I argue for these methods, of course I’m biased. But I’d suggest that when I argue “My method is best normatively because it optimizes characteristic X”, you should question my bias more by disputing whether X as I’ve defined it is important than by wondering whether the method actually optimizes X.

So what do I think is important in a practical proposal for a multi-winner method? It should:

Minimize wasted votes — votes that don’t help elect a candidate. (Under my rough definition of wasted votes, optimizing this implies proportionality.)
For those votes which aren’t wasted, maximize “similarity” between voter’s preferences and candidate’s qualities.
Having looked at many voting methods and many scenarios for each, I find that giving voters breadth of choice does a better job at this than giving them depth of choice. Say I’m voting in a California congressional election, with around 50 seats in play. If I am free to choose my favorite candidate statewide, and then if they lose that vote is transferred based on their preferences, the mismatch between my preferences and theirs introduces less error than if I am able to cast a full ranked ballot in a 5-seat district with 10 times fewer choices.
Be simple for voters
Ranked ballots for more than about a dozen candidates are intolerably complex for most voters.
Retain perceived “advantages” of FPTP, including some guarantees of local representation, as well as a clear concept of “my representative”.
Encourage a moderate number of parties
Have a relatively weak free-riding incentive
Be non-disruptive and otherwise “politically viable”.
This is obviously a judgment call, but I think that a method that is any threat to an incumbent of average popularity is a non-starter. Insofar as outcomes are different, the losing incumbents should be among those with below-average popularity.
Have a precinct-summable counting process
This is useful for transparency of outcomes and for fraud resistance.

PLACE voting was designed with these characteristics in mind; it does reasonably well on all of them. All other methods I know of fail significantly on several characteristics. (In fact, it took me decades of learning about voting theory, followed by almost a year of concentrated design work for hours a week, to settle on PLACE.)

Down in comments, before I finished this article, there was already a comment criticizing PLACE (from somebody who knows me from elsewhere). I understand that the criticism, that voters may find delegated methods distasteful, is real. I don’t think it’s as serious as it would be to fail on the other characteristics above.

If you’re interested in activism on this, contact me. PLACE is compatible with the US constitution and current law, so it could be done by either state or federal legislation. I’m looking to get this passed somewhere (Somerville, MA?) at a municipal level first (there’s a nonpartisan version that’s appropriate). My email is firstname dot lastname at google’s public email service. I’d also encourage you to support the Center for Election Science. Even if you’re in the UK or Canada (especially BC), I can help hook you up with local movements for reform.