epistemic/ontological status: almost certainly all of the following -
a careful research-grade writeup of a genuinely kinda shiny open(?) question in theoretical psephology that will likely never see actual serious non-cooked-up use;
dedicated to a very dear cat;
utterly dependent, for the entirety of the most interesting parts, on definitions I have come up with and results I have personally proven partially using them, which I have done with a professional mathematician’s care; some friends and strangers have also checked them over;
my attempt to prove that something that can reasonably be called a maximal lottery-lottery exists;
my attempt to scavenge what others have left behindand craft a couple of missing pieces, and then to lay out a blueprint for how it could begin to work;
not a 30-minute read
the first half of something incomplete
Maximal Lottery-Lotteries: Far More Than It Ever Occurred To You To Want To Know
This post is mostly a distillation/concentrated treatment of the Maximal Lottery-Lotteries Sequence, though I define an important property and prove an important existence result at the end. It’s the first of two posts in a sequence, the second of which is linked here.
The Maximal Lottery-Lotteries sequence details why anyone should care about sortitive/lottery-using electoral systems even without any particular hope of getting to implement it.[1] It also ends in early November of 2022 with the author and a handful of other technical alignment notables all honorably giving up on the shiny math problem with varying degrees of explicitness. This post is meant as a follow-on to the sequence, and its sequel is, too. I also end up drawing heavily on the Geometric Rationality sequence for its central tools.
As per my usual policy, if I’ve managed to misunderstand anything or write anything up unclearly or incorrectly, or you have any questions about what I’ve written, please comment below or at https://www.admonymous.co/lorxus and I’ll fix it/reply to it when I can.
If you’re reading this post for the first time, you might want to keep the notation reference on hand. If you’re relatively inexperienced with reading text which treats dense mathematical notation on par with English prose, slow down and make sure you understand what each mathematical expression really means or what object or type of object it refers to before continuing. “Is it a nonce-object instantiated just for the proof, or does the notation suggest anything more about its type or identity?” If you can answer that kind of question without much effort, it’ll make understanding any mathematically-dense text much easier. Mathematical notation is extremely compact and precise; if I tried to write all the below purely in prose, it’d be three times as long and a tenth as clear—but all compression comes with compute costs, and math notation is no exception.[2]
Let C be a set of candidates, VC≅Hom(C,[0,1]) the voterspace for C, which is also the set of all utility functions on C (or candidate preferences), and V∈ΔVC an electorate (or voterbase), which is a partition of unity tagged by candidate preferences which we may think of as the “voter share” with each candidate preference.[3] Then a voting systemf:ΔVC→ΔC is a function taking a distribution over voterspace on C to a distribution over C.[4] We write fC(V) for the outcome of voting system f with choice of candidates taken from C and polling electorate V; we’ll often suppress the subscript when the candidate set is clear.
This is classically subject to the following four constraints:[5]
Candidate finiteness: |C|<∞, that is, the set of candidates is finite (else we’ve got lost and ended up somewhere in social choice theory).
Ordinality: f is invariant under precomposition by preorder-respecting functions applied to utility functions—it only uses the preorders on candidates implied by the utility functions (else we need cardinal information, as per score voting, or we allow some preference collapse, as per approval voting).
Determinism: fC(V) always assigns measure 1 to some ci∈C, no matter what C or V are (else we have a sortitive or otherwise nonclassical voting system).
Regularity: the input distribution to f is always the uniform distribution on some finite set (which is a regularity condition to avoid ghost candidates, unequal votes, and other weird possibilities we don’t care about).
We can generally ignore regularity, and (spoilers) we’ll soon be relaxing determinism, too. At the end, we even get rid of ordinality.
Now for a few desirable properties voting systems can have. We’ll phrase them in our language of functions and voting outcomes, which lends itself so much better to generalizing electoral outcomes from deterministic outcomes to lotteries:
Condorcet: Whenever some candidate beats every other candidate in a head-to-head matchup, that candidate should win overall.
Equivalently: assume that for some c∈C, it is the case that ∀d∈C,Pv∼V(v(c)>v(d))>12 whenever d≠c. Then a Condorcetf satisfies fC(V)(c)=1.[6]
Equivalently: assume that we can separate C as C=S⊔D such that ∀c∈S,d∈D,Pv∼V(v(c)>v(d))>12. Then a Smithf satisfies fC(V)(S)=1, that is, fC(V)∈ΔS.
Consistency: Two distinct voterbases each giving the same result as each other should mean that after joining the voterbases, the joined voterbase should still give the same result as before.
Equivalently: assume that fC(V)=fC(W)=D∈ΔC. Then for all choices of intermediacy parameter p∈[0,1], a consistentf satisfies fC(p⋅V+(1−p)⋅W)=D.
Participation: It should never be the case that a voter can predictably do themself better by not voting. This is notably a limiting case of consistency, as one subpopulation becomes increasingly small and certain.
Equivalently, let W∈ΔVC be an electorate, v∈VC a voter, p∈[0,1] an intermediacy parameter. Then a participatoryf satisfies v(fC(p⋅v+(1−p)⋅W))≥v(fC(W)), where we interpret p⋅v+(1−p)⋅W as the p-parametrized mix of W and the element V∈ΔVC that happens to put measure 1 on v, or equivalently as something for which sampling from it means first choosing between v with probability p and otherwise a draw from W with probability 1−p, and where for a lottery D∈ΔC, we interpret v(D)=Ed∼Dv(d).
Dupe-resistance:[7] If there’s two or more candidates in the dupe set D={d1,d2,⋯,dn} where for all voters v∈V and candidate c∈C∖D, there is a d∈D such that v is indifferent between d and each of the di, and v always has the same ordinal preference between c,di for every di, then the outcome of the election should be identical, for every candidate outside the dupe-set D, to if the election were rerun with a single candidate d replacing all of D.
Equivalently, let C=ˆC⊔{d}⊔D, and suppose that ∀c∈ˆC,di∈D, v∈V∈ΔVC, we have v(c)>v(di)↔v(c)>v(d) and v(c)<v(di)↔v(c)<v(d). Then a dupe-resistantf satisfies fC(V)(c)=fC∖D(V)(c) for all c∈ˆC, that is, c still wins exactly as much as if there were just no d-dupes.
Famously, Arrow’s impossibility theorem says that for a classical voting system (i.e., one that deterministically spits out a single candidate at the end, among other requirements), you can’t have all of these. You can’t even have the Condorcet criterion, dupe-resistance, and consistency at the same time—isomorphic to Arrow’s usual sorrowful guarantee. You can’t even have both participation and the Condorcet criterion at the same time! A pall of soot falls over the Sun, and democracy dies to thunderous applause.
Also, I’ve cheated a little here, because that’s technically not the standard definition of the Smith criterion. That’s because the real standard definition is actually just wrong for our purposes—it always talks about such a voting system always picking its unique winner from the Smith set. We recall that voting theorists usually assume determinism of outcomes, which we have long since made up our minds to discard—if we were to put that assumption back, we recover the classical definition of the Smith criterion.
But look—phrasing it this way means that the Smith criterion generalizes itself when we pass from deterministic elections to lotteries, and it even gets much more powerful and flexible when we do! We should think of the Smith condition as being “the Condorcet condition, but one which fails more gracefully and has a reasonable plan in case there’s no Condorcet winner”. For deterministic elections, that additional strength doesn’t count for much more than some annoying noise in the outcomes. However, this weakness—that if we force the Smith set to do all the final choosing for us, we then have to blame it for sometimes producing unsatisfying outcomes—is one we already have the tools to trivially resolve. We simply leave it to a lottery, and the Smith criterion’s issues fall away. Sortitive voting systems are the Smith criterion’s true home.[8]
The Smith criterion’s precondition is strictly weaker than that of the Condorcet criterion (because if there’s a Condorcet winner, then the Smith set is a singleton containing just that candidate) and its backup guarantees are strictly stronger than that of the Condorcet criterion (some admittedly weaker control over the outcome of the election, as opposed to nothing at all) so overall the Smith criterion is a more empowering property than the Condorcet criterion is. That said, it’s not by all that much—for example, in a worst-case scenario where the voterbases’s collective preferences over candidates is maximally-nontransitive, the Smith set is just the entire candidate set.
We start by relaxing determinism and ceasing to bother tracking regularity from among the assumptions in the previous section; the immediate consequence of this is that fC(V) is in general a lottery D∈ΔC rather than an individual candidate (or in reality a measure-1 chance of drawing some specific candidate) as might have been tacitly assumed in the previous section.
Probabilistic outcomes also require actually formulating a definition of what it means for one candidate to beat another. Let A,B∈ΔC be candidate-lotteries, V∈ΔVC an electorate, and for the sake of notation take draws a∼A,b∼B,v∼V. Then A dominatesB (and we write A⪰B) if Pa,b,v(v(a)>v(b))≥Pa,b,v(v(b)>v(a)). In slightly more prose, we take independent draws a,b,v from A,B,V, and then compare v(a),v(b)∈[0,1], and if it’s more often or more likely the case that (after we’ve drawn all our samples) v prefers a to b, then A dominates B. Importantly, dominance is not a necessarily transitive relation, but it still only requires ordinal preference rankings and so our assumption of ordinality is safe for now.
A maximal lottery is some lottery M∈ΔC such that for all candidate-lotteries L∈ΔC,M⪰L.
Theorem: (Maximal lotteries exist): Let C be a set of candidates, V∈VC an electorate. Then there exists at least one candidate-lottery M∈ΔC such that for all choices of candidate-lottery L∈ΔC, M⪰L.
(Proof) Consider the following two-player zero-sum game with finitely many strategies between Alice and Bob: Alice and Bob pick candidates a,b∈C and draw a voter v∼V∈VC. Then they get reward r(A)=Pv(v(a)>v(b))−Pv(v(b)>v(a)),r(B)=−r(A). By Nash’s equilibrium theorem, this game has at least one Nash equilibrium (and it turns out that we can in general effectively compute the set, and that the set is convex). Alice and Bob must both play Nash equilibrium strategies, else be dominated, but the definition of dominance is identical for pick-strategies and candidate-lotteries, and so Alice and Bob’s Nash equilibria must in fact be maximal lotteries.
Additionally, M is Condorcet, consistent, dupe-resistant, and participatory. Indeed, M is characterized as the unique candidate lottery which is Condorcet, consistent, and dupe-resistant. We’ll only prove the first two of those as both the most important and most surprising pair, in light of Arrow’s impossibility theorem, and because the proof of dupe-resistance actually follows from the below lemma that maximal lotteries are Condorcet. There’s nothing particularly fancy or clever going on here, just a chase through some linear-algebra and a little bit of definition-prodding:
Lemma: (Maximal lotteries are Condorcet) Let c∈C be the Condorcet winner; let A,B∈ΔC such that A is a maximal lottery, B picks c with probability 1, and we assume for the sake of a contradiction that A≠B. Since c is the Condorcet winner, Pd∈C∖{c},v∼V(v(c)>v(d))>12, so that Pa∼A,v∼V(v(c)>v(a))>12⋅Pa∼A(a≠c). But because we know that A⪰B, we have Pa∼A,v∼V(v(a)>v(c))≥Pa∼A,v∼V(v(c)>v(a)) so that Pa∼A,v∼V(v(a)>v(c))>12⋅Pa∼A(a≠c) as well. We’re allowed to condition on a≠c here, because by assumption, A≠B, so that Pa∼A(a=c)≠1;Pa∼A(a≠c)≠0, but then Pa∼A,v∼V(v(c)>v(a)),Pa∼A,v∼V(v(a)>v(c))>12, which adds up to more than 1 and is thus impossible. Thus the maximal lottery always picks any Condorcet winner with probability 1, if there is one.
Lemma: (Maximal lotteries are consistent) Let V1,V2∈ΔVC be electorates with A a maximal lottery for both, such that for any other B∈ΔC, A⪰B for V1,V2 both. Fixing an arbitrary intermediacy parameter p∈[0,1] once and for all, let V3=p⋅V1+(1−p)⋅V2, such that to sample from V3 is equivalent from first choosing between V1 with probability p and V2 with probability 1−p, and then drawing a voter from Vi. Fix draws a∼A,b∼B once and for all. Then Pv∼V3(v(a)>v(b))=p⋅Pv∼V1(v(a)>v(b))+(1−p)⋅Pv∼V2(v(a)>v(b)). Now, A⪰B, so by dominance, Pv∼V1(v(a)>v(b))≥Pv∼V1(v(b)>v(a)) and Pv∼V2(v(a)>v(b))≥Pv∼V2(v(b)>v(a)), so that Pv∼V3(v(a)>v(b))≥Pv∼V3(v(b)>v(a)). Thus the maximal lottery is consistent for any choice of parameter p.
With a tiny bit of effort, I can extend this result to the Smith criterion. Unlike the previous lemmas, I suspect that this is an original result, if a modest one. As above, there’s nothing too fancy going on here, just some linear-algebra manipulation and definition-prodding:
Lemma: (Maximal lotteries are Smith) Let C=S⊔D such that ∀c∈S,d∈D, Pv∼V(v(c)>v(d))>12. Let A∈ΔC be a maximal lottery. We assume for the sake of contradiction that A∉ΔS. Let d′∈D be the candidate from the dregs set satisfying d′=argmaxd∈Df(V)(d), where by assumption, f(V)(d)>0. Now for c∈C and arbitrary choice s∈S, we construct the lottery B given by B(d′)=0, B(s)=A(s)+A(d′), and B(c)=A(c) if c≠d′,s - that is, the lottery that outputs the same things A does, except any measure A would call to be given to candidate d′ is instead given to candidate s from the Smith set. We take samples v∼V,a∼A,b∼B; rather than try to directly calculate P(v(a)>v(b)), P(v(b)>v(a)), we find that we can pass to the difference between A,B - that is, P(v(a)>v(b))≥P(v(b)>v(a)) iff B⪰A iff P(v(d′)>v(s))≥P(v(s)>v(d′)), a contradiction. Including non-Smith candidates gets a lottery dominated; therefore, a maximal lottery must be Smith.
It’s worth noting that on the other hand, a maximal lottery need not assign positive measure to every candidate in the Smith set, as far as I can tell. And why should it? Maybe one of the candidates in the Smith set is vastly worse than any other. Maybe three of the candidates in the Smith set are all vastly worse than all the rest, but they happen to form a Condorcet cycle among themselves and the very worst one overall has a tiny edge over some other candidate in the Smith set, so none of them is quite dominated by the entire Smith set. Being a Smith candidate is no guarantee of being any good as a candidate! It’s also worth noting that being Smith, consistent, and dupe-resistant is equally much a ~unique characterization of a maximal lottery as being Condorcet, consistent, and dupe-resistant, since Smith implies Condorcet.
It’s worth noting at this point that if we want to make use of weighted-geometric-mean-based tools, we also technically have to give up on homogeneity, because now, we really do (and should!)care at least a tiny bit about the absolute number of voters in the populations we’re joining together. In reality, we only care about which subelectorate is smaller (or less powerful, or less important, or less significant, or...) than the other one, and by what multiplicative factor.[11] Sometimes the factor is small enough to work with, and other times the factor is so large that we should basically just check if a certain statement holds for all weight-values above some real number. Ultimately, though, we’ll ignore the weakening of homogeneity, given that we’re not doing anything overly bad/non-homogeneous[12] - everything we care about is still completely scale-invariant, as long as we apply the same scale transformation to all voterbases—and thus we preserve most of the homogeneity property, requiring only the added information of electorate size/population/”power factor”, and additionally require that the parameter p is always the share of the voterbase power factor of the smaller/weaker voterbase with respect to both voterbases, such that 0<p≤1. Call it quasi-homogeneity if you want—if you were to fix a positive integer once and for all and duplicated each voter that many times, the outcome of the election shouldn’t change.
The geometric mean of two numbers is given by Gx∈{A,B}x=√A⋅B; similarly, the geometric mean of a finite set of numbers is given by Gx∈{xi}ni=1x=n√Πixi. The p-weighted geometric mean of two numbers is given by Gp(A,B):=√A2p⋅B2−2p=Ap⋅B1−p, where 0<p≤1; we’ll use this notation to simplify expressions later on.[13]
This allows us to straightforwardly define a weighted-collective-utility function on the join of two voterbases (or of a voterbase with a single voter) in a way that has some hope of avoiding the usual failure modes of another possible approach which makes use only of arithmetic means:
Let V,W∈ΔVC be voterbases, u∈VC a single voter.
Let X=V∪W, with V to be weighted p times as strongly as W for some discretionary choice of intermediacy parameter 0<p≤1. Then Gx∼Xx(f(X)):=√Ev∼Vv(f(X))2p⋅Ew∼Ww(f(X))2−2p=Ev∼Vv(f(X))p⋅Ew∼Ww(f(X))1−p=Gp(V(f(X)),W(f(X))).
Let Y={u}∪V. Then Gy∼Yy(f(Y)):=limp→0√u(f(Y))2p⋅Ev∼Vv(f(X))2−2p=limp→0u(f(Y))p⋅Ev∼Vv(f(X))1−p=limp→0Gp(u(f(X)),V(f(X))).
Thankfully, in practice we may always assume that we’re always dealing with two voterbases of vaguely comparable sizes, rather than joining a single voter or a finite voterbase to a potentially infinite one. Why? If we join a finite voterbase to an infinite one and have chosen to weight the two proportionally to their population sizes, then we just throw out the results from the finite voterbase, and if either of the voterbases is uncountable, Arrow’s impossibility theorem no longer holds, so there’s not much point to any of this.[14] So either the voterbases we want to join are both of finite size, in which case we have an easy principled answer—proportional weighting—or they are both countably large, and we should have already chosen how we want to weight the two against each other (with p=12≅q=1 as the principled default choice).
Lottery-Lottery Outcomes and the Lottery-Smith Criterion
A lottery-lottery is what it’s called: a lottery where some or all of the outcomes you can draw are themselves often an instruction to draw from a lottery. Some things change, and more things don’t. A crucially important thing that does not change is the type signature of an electorate, which is still that of a distribution over (or again, if you like, multiset of) functions from the set of candidates to [0,1]; now, voters v∈V - who still have the type signature of a single function fron the candidate set to [0,1] - vote for lottery-candidates in ΔC based on the voter’s expected utility Ec∼Cv(c).
Such strength as we require lottery-lotteries to have will come at a cost. When we passed from classical voting theory to sortitive voting theory, we lost determinacy and discarded regularity. How much will Mathematics demand of us in weakening of properties of candidate-lottery-lotteries from candidate-lotteries, if we want for lottery-lotteries to be an even-handed function for social-preference aggregation? And can we claw anything back?
The first thing to go is the assumption of ordinality. As in the linked illustrative example of a 3-person electorate points out, ordinal preferences are no longer enough to tell us what lotteries should go in a desirable lottery-lottery. Also, if we tell our voters to rate candidates cardinally, we don’t want to have the failure mode where a voter getting an expected utility of 0 means anything other than that their least-favorite candidate is a guaranteed outcome—it should be very hard to get an expected utility of exactly 0. So we’ll replace it with the following slightly weaker regularity condition:
Weak ordinality:[16] Let V∈ΔVC be a voterbase. Then every v∈V is injective—that is, for all c≠d∈C, v(c)≠v(d), so that each voter could in principle rank all candidates linearly.
Next, after Scott, we choose to overtax the Condorcet criterion to empower dupe-resistance:
Lottery-Condorcet: Let V∈ΔVC be an electorate. If for some c∈C and all D∈ΔC, P(v(c)>Ed∼Dv(d))>12 whenever d≠(1c), a lottery-Condorcetf satisfies fC(V)(c)=1.
This is a stronger precondition than the classical Condorcet condition, checks for which will thus come up less often and have less of an effect on outcomes: now the putative lottery-Condorcet-favored candidate has to beat every single dupe-trenchcoat[17] on top of all the real candidates before it can be crowned Obvious Correct Champion.
Dupe-trenchcoat-resistance: Let D∈ΔC be a lottery-candidate, and assume all voters v∈V∈ΔVC vote for candidates in C=C∪{D} based on Ec∼Cv(c), where the real candidates can be interpreted as being given by distributions that assign probability 1 to a single candidate. A dupe-trenchcoat-resistant voting system f is one for which the outcome satisfies fC∪{D}(V)(c)=fC(V)(c) for all c∈C.
This is a more powerful condition to rely on always being true than the classical dupe-resistance criterion: now the voting system must be able to ignore all dupe-trenchcoats on top of still needing to be able to ignore any dupes of single candidates. Indeed, as the original writeup remarks, passing to a maximal lottery-lottery is just like adding all possible lotteries over candidates to the set of candidates,[18] and then running a maximal lottery over the result. For a dupe-trenchcoat-resistant voting system, it doesn’t matter how many suspiciously-trenchcoated dupe-lotteries enter the election—the outcome of the election remains invariant.
Finally, we give up the game-theoretic/strategyproof definition of dominance, which was as follows:[19]
Let A,B∈Δ2C be lottery-lotteries, and let V∈ΔVC. Then taking samples A∼A,B∼B,v∼V for notation, we say that AdominatesB if PA,B,v(Ea∼A,b∼B(v(a)>v(b))≥PA,B,v(Ea∼A,b∼B(v(b)>v(a)), where we note that A,B are lotteries.
But why? Why would we accept such an act, and how could we ever choose something to replace so elegant and so innately appealing as the game-theoretic definition of dominance? The problem lies in the kinds of tradeoffs the strategyproof definition would endorse making, which make perfect sense from a game-theoretic perspective but which would be a terrible idea to have a voting system endorse. To see this, let C={A,B,C}, and let V∈ΔVC be given by 0.51:[A:0.6,B:0.55,C:0],0.48:[A:0,B:0.01,C:1],0.01:[A:0,B:0.55,C:1]. Then under a game-theoretic definition of dominance, picking candidate A with probability 1 would be a maximal lottery (and any lottery-lottery that reduced to it would also be maximal).
But this seems absurd! If we want the dominance of one lottery-lottery over another to mean that we should prefer that the voterbase get the first lottery-lottery outcome over the second, then it seems desirable for that that sense of electoral dominance to be some relatively mathematically simple one which happens to be both utilitarian—preferring outcomes where everyone is on the whole cardinally better off, especially if that’s true for each voter—and egalitarian—dispreferring outcomes where anyone is much worse-off, especially if it’s cheap in overall cardinal utility to make the worst voter less badly off. Manifestly, the game-theoretic definition of dominance, while elegant and inherently strategyproof, does not do these things very well if at all. If we’re lucky, there might even be some natural scoring rule that marks the function of voters’ expected utilities that our notion of dominance relies on as optimal for proportional representation.[20] Luckily for us, just such a function exists—the geometric mean!
Note that even passing to arithmetic means wouldn’t suffice to rule out this setup:[22] always electing A gives V an expected utility of 0.306, while (for example) always electing B gives V an elected utility of 0.2908. On the other hand, taking geometric means means electing A gives V a geometric expected utility of 0.0, and electing B gives V a geometric expected utility of ∼0.0804; this both makes it more clear which outcome is closer to satisfying our desiderata from above and gives us a measure of precisely how much better it is overall than any maximally unequal distribution of expected utilities.[23]
Accordingly, we’ll give the real definition of lottery-lottery dominance more simply as follows:
Lottery-lottery dominance: Let A,B∈Δ2C be lottery-lotteries, and let V∈ΔVC. Then we say that AdominatesB if Gv∈VEa∼∼Av(a)≥Gv∈VEb∼∼Bv(b), and we write A⪰B.[24]
A maximal lottery-lottery is some lottery-lottery M∈Δ2C such that for all L∈Δ2C,M⪰L.
The thing is, we don’t necessary need to steal quite as much strength as that from the Condorcet criterion without giving anything back. Here’s the first major piece of work likely purely my own: the definition of the lottery-Smith criterion.
Lottery-Smith: Divide the candidate-lottery set into candidate-lotteries which should maybe win (frontrunner/lottery-Smith set) and candidate-lotteries who should definitely not win (lottery-dregs), such that any candidate from the frontrunner set beats every candidate from the lottery-dregs set. Then the winner overall should be some distribution over the lottery-Smith set, and in particular, one which is equivalent-in-expectation to one of the distributions over the candidate set we’ve decided is acceptable—we shouldn’t care about telling apart lottery-lotteries that reduce to the same lottery. Equivalently, let V∈ΔVC be an electorate, and separate ΔC as ΔC=SΔ⊔DΔ, such that ∀S∈SΔ,D∈DΔ, Pv∼V(Es∼Sv(s)≥Ed∼Dv(d))>12. Then a lottery-Smith f satisfies fC(V)(SΔ)=1, that is, fC(V)∈ΔSΔ⊆Δ2C.
In the same way that when we passed from Condorcet to lottery-Condorcet, we got a much stronger precondition, the lottery-Smith condition has a much stronger precondition than the classical Smith condition. Accordingly, checks for it will come up less often and have less of an effect on outcomes: now a potentially lottery-Smith-favored candidate has to belong to the lottery-Smith set—not just the Smith set—before it can be crowned an Obvious Strong Candidate which is worth including in some of the lotteries that make up a lottery-Smith lottery-lottery.
But on the other hand, the lottery-Smith condition also hasa weaker precondition than lottery-Condorcet—just as the Smith condition has a weaker precondition to check than the Condorcet condition—and it provides (barely) stronger backup guarantees than lottery-Condorcet—just as the Smith criterion provides (barely) stronger backup guarantees than the Condorcet condition does.
Overall, as far as how the various “candidate-sorting” properties match up against each other:
For how much implied control a property has over electoral outcomes in the average case, assuming it gets to have any: Smith > Condorcet >>> Lottery-Smith >> Lottery-Condorcet
For what properties logically entail which others: Lottery-Smith ⪰ [Lottery-Condorcet ⊥ Smith] ⪰ Condorcet
What do lottery-Smith sets have to look like, apart from the final-probabilities condition? Well, good news...
Proposition: (Lottery-Smith sets all live inside ΔS) Let V∈ΔVC be an electorate over a set C of candidates such that C=S⊔D is a partition of the candidate set into Smith set and dregs and write SΔ⊆ΔC for the lottery-Smith set as in the above definition of the lottery-Smith condition.
We recall the observation above from the proof that every maximal lottery is Smith—namely, that including a dregs candidate anywhere in a candidate-lottery means that that candidate-lottery is dominated by some constructible other one; additionally, we can repeat the construction from the proof replacing candidate s with samples c∼C∈ΔS for any candidate lottery over the Smith set. Thus SΔ⊆ΔS.[25]
...and bad news...
Counterexample: (Lottery-Smith sets range across all of ΔS in the worst case)
Consider the usual example of a maximally muddled voterbase whose social choice graph is nothing but one big Condorcet cycle and whose Smith set is thus all of C, which candidate set is itself small enough at 3 candidates that the global nontransitivity of candidate preference immediately means that no candidate-lottery dominates any other. We can actually pretty straightforwardly cook up maximally nontransitive voter preference distributions in the same general vein as the classic “Rock-Paper-Scissors” one for any odd number of candidates, where the voterbase is split among what I suspect to be something like O(n!)’s worth of different isomorphic noncanonical ways to create the (2n+1)-RPS-digraph of their candidate matchups. Similarly, we can add extra dummy candidates dominated by all existing candidates to ensure that the Smith set is no longer the entire set, if that troubles you. This in turn means that we can’t exclude any element from ΔS=ΔC from the lottery-Smith set.
Thus the sharpest voterbase-agnostic bound we can place on SΔ is simply that SΔ⊆ΔS.[26]
...and good news for people who like bad news.
Proposition: (Lottery-lotteries are strongly characterized by their selectivity of partitions of unity; the best lottery-Smith lottery-lotteries are the maximal ones) Let L be a lottery-lottery over a set of candidates C, and let v∈V be a voter. It suffices to note that Gv∈VEa∼∼Lv(a) pulls back to a total order on Δ2S, which itself induces a total order on elements of ΔS. This is the same order as the one corresponding to the pullback of the analogous utility order on Gv∈VEa∼Lv(a), where L∈ΔS is given by L(c):=Pd∼∼L(d=c). Let UL⊆ΔS be the set of partitions of unity realized by arbitrary subsets of L. If L is strictly nonmaximal, UL can’t consist only of the utility-maximal elements of ΔS - something has to be able to dominate it in expectation. On the other hand, this also lets us strongly characterize what a maximal lottery-lottery M would have to look like—namely, UM⊆ΔSis comprised entirely of utility-maximal elements of ΔS.
We say specifically “utility-maximal” or “utility-nonmaximal” here because the conditions for maximality for candidates and lotteries are now meaningfully different from the maximality condition for lotteries—in particular, utility-maximality is transitive.
As a relevant observation, we should recall that voters vote based on expected utility, and so it can sometimes be more natural to think of elements of ΔC as being the points of a simplex, rather than distinguish lotteries with identical expected outcomes. That way, we can think of our voting system and electorate as giving us a map fV:ΔC→[0,1],L↦Gv∈VEc∼Lv(c) from the simplex to the unit interval; to some extent, all we care about is the preimage of expected utilities in [0,1].[27][28] Ties are a lot more common for lottery-lotteries,[29] and that’s before even considering the rare random tie grounded in some indifference of outcomes which just happen to have the same expected values under the electorate-utility-function—in such a case, two such possible maximal lottery-lotteries need not even share any overlap in the set of partitions of unity realized by subsets.[30] Thankfully, the latter ties depend on the voterbase and are thus likely not generic.[31]
Working over lotteries instead of lottery-lotteries has no equivalent to this extra reduction step, either way we might slice it. We didn’t need either half of this extra step, since maximal lotteries already are their own final probabilities over candidates, and we were totally fine with the maximal lottery being a lottery over several candidates, because there wasn’t an additional level to have to think about or reduce from. But now, not only is it possible for any final distribution over candidates to be realized in entire families of different equally valid ways, but it’s not even guaranteed that the voterbase-favored effective final lottery over candidates is unique! I am almost certain that this is some substantial part of the true reason for the ‘fractal structure due to Nash-equilibrium constraints’ Scott posits.[32] Additionally, because ΔC,Δ2C are compact and can be realized as simplices in Euclidean space, and because the geometric-expected-value evaluation maps eΔ:(ΔC,VC)→[0,1],(L,V)↦Gv∈VEc∼Lv(c), eΔ2:(Δ2C,VC)→[0,1],(L,V)↦Gv∈VEc′∼∼Lv(c′) are continuous for any finite choice of voterbase,[33] we know that by the extreme value theorem, these maps achieve their respective maxima somewhere on their domains. Accordingly, we’re fine to just assume that they exist, but will generally not be unique.
This all presents a strong impression of what a maximal lottery-lottery would have to be—some geometrically-maximizing lottery over candidate-lotteries which are themselves game-theoretically-dominant lotteries over Smith candidates (who are themselves game-theoretically non-dominated).
Better-apprised of the benefits of geometric maximization, a little more effort now results in not only the proof we were hoping for earlier, but a substantially stronger existence result.[34]
Theorem: (Maximal lottery-lotteries exist in principle) Consider the following two-player zero-sum game between Alice and Bob: Alice and Bob pick candidate-lotteries A,B∈ΔC and draw voters from some at-most-countable electorate V∈VC. Then they get reward rAlice=Gv∈VEa∼Av(a)−Gv∈VEb∼Bv(b); rBob=−rAlice. Since ΔC is compact and (crucially) Alice’s (and thus trivially Bob’s) utility function is continuous, this game has a family of mixed strategy equilibria taking the form of a lottery over candidate-lotteries. Because the condition for strategic dominance in this game corresponds to our definition of lottery-lottery dominance, a Nash equilibrium for this game corresponds to a maximal lottery-lottery.
This looks like it’s just a patch to the earlier lemma to make it into an existence proof merely by taking the abandonment of ordinality maximally seriously. It is not. It rests heavily on the revised definition of lottery-lottery dominance I considered it justified to pick, given our other philosophical desiderata for voting systems and our abandonment of ordinality and homogeneity.
Corollary: By Nash’s equilibrium theorem, we can in general effectively compute the set of maximal lottery-lotteries from the utility functions in V along with their vote-shares, and that the set of those equilibria is convex—this corresponds to the fact that a convex combination of maximal lottery-lotteries is itself maximal.
In particular, we can characterize lottery-lotteries not only by their final probabilities over candidates, but also by how much probability they allocate to each candidate-tagged partition of unity, where each of the candidate-tagged partitions of unity would result in the same geometric-expected payoff as any other.
Now, existence results are near and dear to my heart as a mathematician by inclination and by training both, but I also believe in winning maximally and, on finding checkmate, looking for better. So it’s certainly better than nothing… but it’s still not good enough. We want a more constructive process or some characterizing set of theorems.
And we’ll get to that! Although a natural semi-constructive option should be jumping out at us now,[35] we first need a better understanding of how precisely a maximal lottery-lottery would have to handle joins of populations if we want to be sure that this definition of a maximal lottery-lottery holds up the way we should want it to for joins of populations, and so my definition of the modularity criterion, which replaces both participation and consistency, will have to wait for the next post, as will my construction(s) of a maximal lottery-lottery. However, this seems like a natural point to break the post—some light experimentation suggested that if I make this single post any longer, anyone reading it will rapidly fall off in energy and interest, and that’s not what I want.[36] So I’ll end this post with the pure existence result, and when you feel up to reading about how I define modularity and how I construct maximal lottery-lotteries, click here.
Throughout this post, I consistently use the following font conventions for mathematical objects:
Lowercase letters in normal Computer Modern (like a,b) are always candidates or voters.
Uppercase letters in normal Computer Modern (like A,B) are always lotteries or voterbases.
Blackboard bold is only ever used as an object for the voterspace VC. It also gets reserved for use as the expected value symbol E and the geometric mean symbol G.
Fraktur is only ever used for the candidate set C and the dregs set D. I would also have used it for the Smith set S, but \frak{S} is famously bad. I thought it was a G for years until grad school because it used to be standard for the symmetry group on n letters. Seriously, just look at it: S.
The calligraphic font is only ever used for lottery-lotteries (like A,B) and for the Smith set S.
Here’s a guide to the standard mathematical notation I use extensively here:
Morphism/Hom-sets - Hom(A,B) - the set of functions from A to B
Disjoint union - A⊔B - the union of the sets A and B, which are either implicitly assumed to be disjoint or else elements in the union are tagged with their set of origin; sometimes used to implicitly partition another set (e.g. S=A⊔B⊔C; C=S⊔D (the partition of the candidate set into Smith and dregs))
Sampling/drawing from a probability distribution - x∼X - x is a random draw from X
Sampling/drawing from a lottery-lottery - X∼X; x∼∼X - X is a randomly drawn lottery from X; x is a random draw from a randomly drawn lottery from X
Probability notation - Px∼Xe(x) - the probability of e(x) happening, where x is drawn from X
Expected value notation - Ex∼Xf(x) - the expected value of f(x), where x is drawn from X
Geometric mean notation - Gx∈Xf(x) - the geometric mean of the values of f(x), where x ranges over X
Weighted geometric mean notation - Gp(x,y):=xpy1−p - the p-weighted geometric mean of x,y
Candidate sets - c∈C - an set of arbitrary elements (candidates) and possibly lotteries on those candidates
Smith sets and dregs - C=S⊔D - the candidate set is partitionable into a Smith set (guaranteed to be nonempty) and a dregs set (about which there are no guarantees)
Candidate lotteries - ΔC - the set of candidate-tagged partitions of unity; the set of probability distributions of the set of candidates
Utility-functions - VC≅Hom(C,[0,1]) - the set of functions from the set of candidates to the unit interval[37]
Voterbases/electorates - V∈ΔVC - vote shares (or probability distributions) over the set of utility functions on candidates
Candidate lottery-lotteries/distribution-distributions - Δ2C - the set of candidate-tagged-partition-of-unity-tagged partitions of unity; the set of probability distributions of probability distributions of the set of candidates[38]
Domination - A⪰B; A⪰B - object A dominates object B; as an abuse of notation, property A logically entails property B.
The quick version, which I detail slightly more at the end of Lottery Outcomes above and which Scott writes up in Maximal Lotteries, is that if you allow for the possibility of lottery outcomes (which don’t even have to crop up most of the time), you can have voting systems that defy Arrow’s impossibility theorem by being both Condorcet and consistent.
This is fully general advice for reading math papers or anything with lots of abstract quantitative content to it, and is meant for almost literally anyone who hasn’t taken an proof-based course in math. I would provide it almost verbatim to anyone who asked me for my professional advice on how to avoid bouncing off a math paper or how to successfully read one. Every time I neglect it myself when reading math, I end up wasting time and regretting it and cursing my folly as I pick thorns and bramble out of my face. It is good advice; bitter like leafy greens. Please take it.
If you’d rather think of it as a multiset, that’s cool too. That technically violates homogeneity, but we’ll actually have to massively weaken that assumption later on anyway when we start talking about conditions on outcomes over joins of electorates, so it’s really down to taste and which way of thinking about, seeing, and manipulating the underlying objects comes most naturally to you. They’re basically equivalent.
Yes, you read that correctly. A distribution. Like, a probability distribution. And yes, the “determinism” condition introduced immediately afterwards completely obviates this setup—the output distribution is required to assign all of its probability-mass to a single outcome—such that the move to lottery outcomes is extremely telegraphed.
Scott doesn’t give these properties names in his writeup and I can’t find standard names for them anywhere else. I’ve thus chosen to given them evocative names for myself.
“Whenever we pick a random voter with utility function v from the electorate V, if the probability that v ranks c above d is always >12, no matter the d, that means that a majority of the electorate prefers c to d, and fC(V)(c) therefore assigns weight 1 to candidate c as the Obvious Correct Champion.”
In Scott’s writeup and likely elsewhere, this is called “clone-invariance” (also known as independence of clones), which is actually a slightly weaker form of the criterion found in Arrow’s impossibility theorem, independence of irrelevant alternatives (IIA). Cousin-properties also weaker than IIA include local independence (deleting the winner and deleting the last-place loser must each leave the other candidates’ relative finishes unchanged), independence of worst candidates (deleting the n worst candidates must leave the other candidates’ finishes unchanged), and Smith independence (if for some proper subset of the candidates (the Frontrunners), every candidate in the set wins all its head-to-head matchups against every candidate outside the set (the Dregs), then deleting all of the Dregs must leave the Frontrunners’ finishes unchanged).
You know, the same way that toruses would prefer to live in R4 so they can lie everywhere-locally flat, and how all nitrogen’s favorite home in chemical-space is as nice stable dinitrogen, and how rays of light really do want to move in straight lines, honest, it’s just all this rest mass hanging around distorting the metric.
In one very austere sense, this is all that need be said in this entire post: change the dominance criterion from game-theoretic to Nash-bargaining, as realized by a geometric mean. The rest is careful commentary.
Unfortunately, the lottery-Condorcet condition is way weaker than the ordinary Condorcet criterion, so unlike for maximal lotteries, we’re likely going to have to break out the weird-shaped dice and the spinners and the quantum-random bits and such.
Strictly speaking, this actually describes something determined by the weight value p - that is, q=1−pp, which ranges over 0≤q<∞ when 0<p≤1. This way, though, we don’t have to even start relitigating (in the prose commentary!) whether or not it makes sense to talk about arbitrarily small positive numbers.
Why the factors of 2 in the exponents under the square root? Because this needs to be a straightforward generalization of the unweighted geometric mean. Try it yourself for p=12.
Unless you like nonprincipal ultrafilters way more than I do on any given day, anyway. Most of the time, I find them spooky, and I generally require extra caffeine before I’m willing to deal with them.
Seriously, go read The Lottery in Babylon. It’s like four pages long, very fun, and exquisitely thematically relevant to us here. In fact, it would make an excellent palate cleanser for between the two posts.
We should treat this as a regularity condition on the voters and the candidates both, given that it implies that a voter won’t/can’t have utility 0 on more than one candidate. It’s not that strong a condition—if a significant number of voters maximally hate the entire candidate set, that implies problems about the candidate set that no clever voting system can fix.
SCENE: Daytime, exterior. Standing in a row are the candidates for the upcoming election for World Ruler: Alice, Bob, Claire, a twenty-foot-tall thin and misshapen figure in a large trench-coat parts of which are partially transparent, Daniel, and Eve. They have come before The World Electoral Commission, which has controversially banned lottery-duplicate candidates, and which has gathered these notables to question them on suspicion of an illicit lottery-duplicate candidate having managed to register anyway. Tension mounts as the questioning enters its second hour—they’re in for a long day.
We might also want for whatever the function on expected voter utilities to be effectively computable, given those utilities, and also compositional over joins of electorates. Unlike the more important properties listed in the text proper, the arithmetic mean definition would satisfy both of these and the game-theoretic definition only sometimes satisfies the latter.
Here’s a sketch of yet another way of convincing yourself that we don’t need to worry too much about abandoning strategyproofness as dominance criterion. It’ll require you to think of our voting systems in the frankly pseudohomogeneous case—i.e., as being about a multiset of utility functions—and it also won’t make sense until you’ve read about modularity in the next post, but if you don’t want to go read that, it’ll be fine to think of a modular voting system as a voting system that’s almost consistent in a way that plays nicely with joining populations with weighted geometric means.
The key things to notice are that a voter’s preferences over lotteries are entirely determined by their preferences over candidates, that they assign distinct nonnegative utilities to each candidate, and thus that no clever voting plan gives any individual voter more control over outcomes, after the geometric averaging, than any other. The first two mean that there’s at most a single lottery that any voter has expected value constant-0 on: the one assigning probability 1 to whichever candidate that voter likes least. In particular, a voter can’t rig their utility function so as to claim to assign utility 0 to almost all lotteries, as would be an obvious strategy for hijacking an electorate, and no voter has more control over outcomes than any other, as we’ll see more explicitly when I sketch a construction of maximal lottery lotteries. In some sense, none of the sharply dominating voting strategies you could implement for rigging an election over candidate-lotteries—even assuming the election takes voters’ utility functions into account—work for dominating voting in a maximal lottery-lottery, or even a modular lottery-lottery, because they’re not even in the strategy set available to the voters. As a direct result, the only plays available to individual voters are the “fair” ones implying linearly self-consistent things about their own preferences, so that every voter has the same amount of steering power over outcomes.
By contrast, the lottery-outcome given by random-dictatorship of subpopulations, (0.51A,0.49C), has geometric-expected utility (0.51⋅0.6+0.49⋅0)0.51⋅(0.51⋅0+0.49⋅1)0.49=0.3060.51⋅0.490.49∼0.3854 (and arithmetic-expected utility ∼0.411), a massive improvement over either one!
Yes, I’m still going to write ⪰ instead of ≥. No, I don’t care that this is now a total order induced by the canonical pullback of the canonical order on the reals. I want a symbol specifically for dominance-and-maybe-not-literally-greater-than, and I’m the one writing the post.
Lemma: (Maximal lottery-lottery lotteries are always Smith) We’ll prove this by contradiction for the case where there’s even one dregs candidate anywhere in the maximal lottery-lottery.Let C=S⊔D be a set of candidates divided into Smith set and dregs. Let M be a maximal lottery-lottery, and suppose that for some lottery L∈M, Pl∼L(l=d)>0, for some d∈D. We construct the lottery-lottery N as follows. Pick some s∈S. For all M∈M∩ΔS, N(M)=M(M), and for all M′∈M such that M′(d)>0, let N′(t)=M′(t) for all t≠s∈S, N′(d)=0, and N′(s)=M′(s)+M′(d); then N(N′)=M(M′) - that is, the lottery-lottery that outputs the same things that M does, except that any measure that any of the lotteries with any positive measure in M would call to be given to candidate d is instead given to candidate s from the Smith set. We take samples v∼V,M∼M,N∼N; rather than try to directly calculate PM,N,v(v(N)>v(M)), PM,N,v(v(N)>v(M)), we find that we can pass to the difference between M,N - that is, PM,N,v(v(N)>v(M))≥PM,N,v(v(N)>v(M)) iff N⪰M iff P(v(d)>v(s))≥P(v(s)>v(d)), a contradiction. Including non-Smith candidates gets a lottery dominated, and it equally well gets a lottery-lottery containing such a lottery dominated; therefore, every lottery in a maximal lottery-lottery must be Smith.
Generally, the lottery-Smith set will be some convex subset of ΔS. In fact, for ⊆c taken to mean “is a convex subset of, within the appropriate domain”, S⊆cC, SΔ⊆cΔS⊆cΔC.
We use the geometric mean over voters here (weighted by vote-share) instead of taking the arithmetic expected value sampling over the voterbase because later on, we’ll use geometric maximization extensively for its compositional properties and fairness guarantees. I decided to include it in the definition here rather than keep it a mystery until after I talk about geometric maximization for the sake of clarity. If you don’t know yet what’s going on with it, either don’t worry about it for now or just skip to the section about geometric maximization and come back.
Take the example (for the maximally muddled 3-voter electorate) of the lottery-lotteries (13(1A),13(1B),13(1C)) and (16(1A),16(1B),16(1C),12(13A,13B,13C)) - and the entire rest of the family (p3(1A),p3(1B),p3(1C),3−3p(13A,13B,13C)), for 0≤p≤1, including (1(13A,13B,13C)). All of these are equivalent to the same maximal lottery-lottery—the one for which U={(13A,13B,13C)} - and these aren’t even all of the lotteries that can show up for the maximal lottery-lotteries for this voterbase.
Here’s a very simple example of this kind of failure happening maximally badly in action: take a candidate set of just three candidates, A,B,C, and a voterbase of just two voters, v,w, such that v(A)=1,v(B)=0,v(C)=12 and w(A)=0,w(B)=1,w(C)=12. That is - C is a compromise candidate. Then any of the family of lotteries given by (p(12A,12B),(1−p)(1C)),0≤p≤1 all tie each other for expected utility, despite giving any candidate probability to C you like! And it should—for all intents and purposes, C is indistinguishable from (12A,12B).
The rest is covered by the compositionality you get from the use of weighted geometric means, and the equivalence between the “mash them all together immediately” and “assembly by means of full binary tree” approaches that I describe in the next post.
To see this for the lottery case, note that for any voter v∈V, Ec∈Lv(c) is continuous over L∈ΔC, and then we take the geometric mean of a countable number of continuous functions, the result of which is still continuous under the appropriate metric. (If you don’t like that, pass to finite instead. Its support is also limited to parts of ΔC where no voter has expected utility 0.) The lottery-lottery case is identical, except that we first reduce to a lottery under the usual map L↦Pc∼∼L(c=ci). All of this probably also works for the case of countably many voters, but I don’t know how to prove that, and anyway we’ll see later that we can pretty much just pass to the case where |V|<∞. This kind of approach also lets us trivially prove things like “every continuous map from lottery-lotteries to lottery-lotteries has at least one fixed point; this includes every continuous map which is weakly increasing in geometric expectation of utility”, which you may spot as yet another avenue I might have picked to prove the existence of maximal lottery-lotteries. I don’t actually know if that proof sketch works, though.Δ2C would, at the absolute least, have to be endowed with an appropriately-motivated topology.
I’m not going to lie, when I set out to prove something like this theorem, I wasn’t expecting to prove anything more than that a maximal lottery-lottery would have to be lottery-Smith. I totally underestimated this approach, judging that no way could just patching the lemma from earlier work. Turns out: full utility data is really powerful.
I’ve described it elsewhere in the post and so won’t here. If that option isn’t jumping out at you… I want to say “you should think about it more carefully” but really I don’t know how hard it is to see the answer. I think it’s worth taking a literal clock-time five minutes to see if you can spot it; and if you still can’t, fair play—but better if you make your own guesses first.
Water/a drink, caffeine/meds, maybe a light snack, definitely a stretch and a pace.[15] That’s my recommendation, anyway. The second post will almost certainly still be there after you’re done and you’ll be better suited to process it.
(Geometrically) Maximal Lottery-Lotteries Exist
epistemic/ontological status: almost certainly all of the following -
a careful research-grade writeup of a genuinely kinda shiny open(?) question in theoretical psephology that will likely never see actual serious non-cooked-up use;
dedicated to a very dear cat;
utterly dependent, for the entirety of the most interesting parts, on definitions I have come up with and results I have personally proven partially using them, which I have done with a professional mathematician’s care; some friends and strangers have also checked them over;
my attempt to prove that something that can reasonably be called a maximal lottery-lottery exists;
my attempt to scavenge what others have left behind and craft a couple of missing pieces, and then to lay out a blueprint for how it could begin to work;
not a 30-minute read
the first half of something incomplete
Maximal Lottery-Lotteries: Far More Than It Ever Occurred To You To Want To Know
This post is mostly a distillation/concentrated treatment of the Maximal Lottery-Lotteries Sequence, though I define an important property and prove an important existence result at the end. It’s the first of two posts in a sequence, the second of which is linked here.
The Maximal Lottery-Lotteries sequence details why anyone should care about sortitive/lottery-using electoral systems even without any particular hope of getting to implement it.[1] It also ends in early November of 2022 with the author and a handful of other technical alignment notables all honorably giving up on the shiny math problem with varying degrees of explicitness. This post is meant as a follow-on to the sequence, and its sequel is, too. I also end up drawing heavily on the Geometric Rationality sequence for its central tools.
As per my usual policy, if I’ve managed to misunderstand anything or write anything up unclearly or incorrectly, or you have any questions about what I’ve written, please comment below or at https://www.admonymous.co/lorxus and I’ll fix it/reply to it when I can.
If you’re reading this post for the first time, you might want to keep the notation reference on hand. If you’re relatively inexperienced with reading text which treats dense mathematical notation on par with English prose, slow down and make sure you understand what each mathematical expression really means or what object or type of object it refers to before continuing. “Is it a nonce-object instantiated just for the proof, or does the notation suggest anything more about its type or identity?” If you can answer that kind of question without much effort, it’ll make understanding any mathematically-dense text much easier. Mathematical notation is extremely compact and precise; if I tried to write all the below purely in prose, it’d be three times as long and a tenth as clear—but all compression comes with compute costs, and math notation is no exception.[2]
Generalized Voting Theory
Let C be a set of candidates, VC≅Hom(C,[0,1]) the voterspace for C, which is also the set of all utility functions on C (or candidate preferences), and V∈ΔVC an electorate (or voterbase), which is a partition of unity tagged by candidate preferences which we may think of as the “voter share” with each candidate preference.[3] Then a voting system f:ΔVC→ΔC is a function taking a distribution over voterspace on C to a distribution over C.[4] We write fC(V) for the outcome of voting system f with choice of candidates taken from C and polling electorate V; we’ll often suppress the subscript when the candidate set is clear.
This is classically subject to the following four constraints:[5]
Candidate finiteness: |C|<∞, that is, the set of candidates is finite (else we’ve got lost and ended up somewhere in social choice theory).
Ordinality: f is invariant under precomposition by preorder-respecting functions applied to utility functions—it only uses the preorders on candidates implied by the utility functions (else we need cardinal information, as per score voting, or we allow some preference collapse, as per approval voting).
Determinism: fC(V) always assigns measure 1 to some ci∈C, no matter what C or V are (else we have a sortitive or otherwise nonclassical voting system).
Regularity: the input distribution to f is always the uniform distribution on some finite set (which is a regularity condition to avoid ghost candidates, unequal votes, and other weird possibilities we don’t care about).
We can generally ignore regularity, and (spoilers) we’ll soon be relaxing determinism, too. At the end, we even get rid of ordinality.
Now for a few desirable properties voting systems can have. We’ll phrase them in our language of functions and voting outcomes, which lends itself so much better to generalizing electoral outcomes from deterministic outcomes to lotteries:
Condorcet: Whenever some candidate beats every other candidate in a head-to-head matchup, that candidate should win overall.
Equivalently: assume that for some c∈C, it is the case that ∀d∈C,Pv∼V(v(c)>v(d))>12 whenever d≠c. Then a Condorcet f satisfies fC(V)(c)=1.[6]
Smith: Divide the candidate set into candidates who should maybe win (frontrunner/Smith set) and candidates who should definitely not win (dregs), such that any candidate from the frontrunner set beats every candidate from the dregs set. The Smith set is always nonempty, though the dregs set might be empty in the usual irresolvable situations. Then the winner overall should be some distribution over the Smith set.
Equivalently: assume that we can separate C as C=S⊔D such that ∀c∈S,d∈D,Pv∼V(v(c)>v(d))>12. Then a Smith f satisfies fC(V)(S)=1, that is, fC(V)∈ΔS.
Consistency: Two distinct voterbases each giving the same result as each other should mean that after joining the voterbases, the joined voterbase should still give the same result as before.
Equivalently: assume that fC(V)=fC(W)=D∈ΔC. Then for all choices of intermediacy parameter p∈[0,1], a consistent f satisfies fC(p⋅V+(1−p)⋅W)=D.
Participation: It should never be the case that a voter can predictably do themself better by not voting. This is notably a limiting case of consistency, as one subpopulation becomes increasingly small and certain.
Equivalently, let W∈ΔVC be an electorate, v∈VC a voter, p∈[0,1] an intermediacy parameter. Then a participatory f satisfies v(fC(p⋅v+(1−p)⋅W))≥v(fC(W)), where we interpret p⋅v+(1−p)⋅W as the p-parametrized mix of W and the element V∈ΔVC that happens to put measure 1 on v, or equivalently as something for which sampling from it means first choosing between v with probability p and otherwise a draw from W with probability 1−p, and where for a lottery D∈ΔC, we interpret v(D)=Ed∼Dv(d).
Dupe-resistance:[7] If there’s two or more candidates in the dupe set D={d1,d2,⋯,dn} where for all voters v∈V and candidate c∈C∖D, there is a d∈D such that v is indifferent between d and each of the di, and v always has the same ordinal preference between c,di for every di, then the outcome of the election should be identical, for every candidate outside the dupe-set D, to if the election were rerun with a single candidate d replacing all of D.
Equivalently, let C=ˆC⊔{d}⊔D, and suppose that ∀c∈ˆC,di∈D,
v∈V∈ΔVC, we have v(c)>v(di)↔v(c)>v(d) and v(c)<v(di)↔v(c)<v(d). Then a dupe-resistant f satisfies fC(V)(c)=fC∖D(V)(c) for all c∈ˆC, that is, c still wins exactly as much as if there were just no d-dupes.
Famously, Arrow’s impossibility theorem says that for a classical voting system (i.e., one that deterministically spits out a single candidate at the end, among other requirements), you can’t have all of these. You can’t even have the Condorcet criterion, dupe-resistance, and consistency at the same time—isomorphic to Arrow’s usual sorrowful guarantee. You can’t even have both participation and the Condorcet criterion at the same time! A pall of soot falls over the Sun, and democracy dies to thunderous applause.
Also, I’ve cheated a little here, because that’s technically not the standard definition of the Smith criterion. That’s because the real standard definition is actually just wrong for our purposes—it always talks about such a voting system always picking its unique winner from the Smith set. We recall that voting theorists usually assume determinism of outcomes, which we have long since made up our minds to discard—if we were to put that assumption back, we recover the classical definition of the Smith criterion.
But look—phrasing it this way means that the Smith criterion generalizes itself when we pass from deterministic elections to lotteries, and it even gets much more powerful and flexible when we do! We should think of the Smith condition as being “the Condorcet condition, but one which fails more gracefully and has a reasonable plan in case there’s no Condorcet winner”. For deterministic elections, that additional strength doesn’t count for much more than some annoying noise in the outcomes. However, this weakness—that if we force the Smith set to do all the final choosing for us, we then have to blame it for sometimes producing unsatisfying outcomes—is one we already have the tools to trivially resolve. We simply leave it to a lottery, and the Smith criterion’s issues fall away. Sortitive voting systems are the Smith criterion’s true home.[8]
The Smith criterion’s precondition is strictly weaker than that of the Condorcet criterion (because if there’s a Condorcet winner, then the Smith set is a singleton containing just that candidate) and its backup guarantees are strictly stronger than that of the Condorcet criterion (some admittedly weaker control over the outcome of the election, as opposed to nothing at all) so overall the Smith criterion is a more empowering property than the Condorcet criterion is. That said, it’s not by all that much—for example, in a worst-case scenario where the voterbases’s collective preferences over candidates is maximally-nontransitive, the Smith set is just the entire candidate set.
Lottery Outcomes
We start by relaxing determinism and ceasing to bother tracking regularity from among the assumptions in the previous section; the immediate consequence of this is that fC(V) is in general a lottery D∈ΔC rather than an individual candidate (or in reality a measure-1 chance of drawing some specific candidate) as might have been tacitly assumed in the previous section.
Probabilistic outcomes also require actually formulating a definition of what it means for one candidate to beat another. Let A,B∈ΔC be candidate-lotteries, V∈ΔVC an electorate, and for the sake of notation take draws a∼A,b∼B,v∼V. Then A dominates B (and we write A⪰B) if Pa,b,v(v(a)>v(b))≥Pa,b,v(v(b)>v(a)). In slightly more prose, we take independent draws a,b,v from A,B,V, and then compare v(a),v(b)∈[0,1], and if it’s more often or more likely the case that (after we’ve drawn all our samples) v prefers a to b, then A dominates B. Importantly, dominance is not a necessarily transitive relation, but it still only requires ordinal preference rankings and so our assumption of ordinality is safe for now.
A maximal lottery is some lottery M∈ΔC such that for all candidate-lotteries L∈ΔC,M⪰L.
Additionally, M is Condorcet, consistent, dupe-resistant, and participatory. Indeed, M is characterized as the unique candidate lottery which is Condorcet, consistent, and dupe-resistant. We’ll only prove the first two of those as both the most important and most surprising pair, in light of Arrow’s impossibility theorem, and because the proof of dupe-resistance actually follows from the below lemma that maximal lotteries are Condorcet. There’s nothing particularly fancy or clever going on here, just a chase through some linear-algebra and a little bit of definition-prodding:
With a tiny bit of effort, I can extend this result to the Smith criterion. Unlike the previous lemmas, I suspect that this is an original result, if a modest one. As above, there’s nothing too fancy going on here, just some linear-algebra manipulation and definition-prodding:
It’s worth noting that on the other hand, a maximal lottery need not assign positive measure to every candidate in the Smith set, as far as I can tell. And why should it? Maybe one of the candidates in the Smith set is vastly worse than any other. Maybe three of the candidates in the Smith set are all vastly worse than all the rest, but they happen to form a Condorcet cycle among themselves and the very worst one overall has a tiny edge over some other candidate in the Smith set, so none of them is quite dominated by the entire Smith set. Being a Smith candidate is no guarantee of being any good as a candidate! It’s also worth noting that being Smith, consistent, and dupe-resistant is equally much a ~unique characterization of a maximal lottery as being Condorcet, consistent, and dupe-resistant, since Smith implies Condorcet.
Geometric Maximization, Geometric Mean
The next piece of the puzzle to introduce up front is one that didn’t make its way into the initial maximal lottery-lottery sequence. That piece is called geometric maximization. The heart of the motivation for using weighted geometric means and geometric maximization is this: geometric maximization is an ironclad defense against the very worst of the tyranny of the majority, because if even a single utility function in a set is constant-0 on expectation, the geometric mean over all utility functions in the set is necessarily constant-0, too. Because geometric maximization characterizes Nash bargaining, it fits the bill if you model voting as modeling civil war where the winner is neither the best-resourced side with certainty nor determined perfectly proportionately to resources, but rather the division of spoils is instead as given by the outcome of the high-stakes cooperative bargaining (with anarchy as default outcome) the soldiers might be engaging in if they could think and negotiate fast enough to avoid a war, which is better than both. For this reason, when we want to maximize “the expected utility of the whole voterbase”, what we’ll maximize is the geometric mean of the expected utility of each of the voters.[9][10]
It’s worth noting at this point that if we want to make use of weighted-geometric-mean-based tools, we also technically have to give up on homogeneity, because now, we really do (and should!) care at least a tiny bit about the absolute number of voters in the populations we’re joining together. In reality, we only care about which subelectorate is smaller (or less powerful, or less important, or less significant, or...) than the other one, and by what multiplicative factor.[11] Sometimes the factor is small enough to work with, and other times the factor is so large that we should basically just check if a certain statement holds for all weight-values above some real number. Ultimately, though, we’ll ignore the weakening of homogeneity, given that we’re not doing anything overly bad/non-homogeneous[12] - everything we care about is still completely scale-invariant, as long as we apply the same scale transformation to all voterbases—and thus we preserve most of the homogeneity property, requiring only the added information of electorate size/population/”power factor”, and additionally require that the parameter p is always the share of the voterbase power factor of the smaller/weaker voterbase with respect to both voterbases, such that 0<p≤1. Call it quasi-homogeneity if you want—if you were to fix a positive integer once and for all and duplicated each voter that many times, the outcome of the election shouldn’t change.
The geometric mean of two numbers is given by Gx∈{A,B}x=√A⋅B; similarly, the geometric mean of a finite set of numbers is given by Gx∈{xi}ni=1x=n√Πixi. The p-weighted geometric mean of two numbers is given by Gp(A,B):=√A2p⋅B2−2p=Ap⋅B1−p, where 0<p≤1; we’ll use this notation to simplify expressions later on.[13]
This allows us to straightforwardly define a weighted-collective-utility function on the join of two voterbases (or of a voterbase with a single voter) in a way that has some hope of avoiding the usual failure modes of another possible approach which makes use only of arithmetic means:
Thankfully, in practice we may always assume that we’re always dealing with two voterbases of vaguely comparable sizes, rather than joining a single voter or a finite voterbase to a potentially infinite one. Why? If we join a finite voterbase to an infinite one and have chosen to weight the two proportionally to their population sizes, then we just throw out the results from the finite voterbase, and if either of the voterbases is uncountable, Arrow’s impossibility theorem no longer holds, so there’s not much point to any of this.[14] So either the voterbases we want to join are both of finite size, in which case we have an easy principled answer—proportional weighting—or they are both countably large, and we should have already chosen how we want to weight the two against each other (with p=12≅q=1 as the principled default choice).
Lottery-Lottery Outcomes and the Lottery-Smith Criterion
A lottery-lottery is what it’s called: a lottery where some or all of the outcomes you can draw are themselves often an instruction to draw from a lottery. Some things change, and more things don’t. A crucially important thing that does not change is the type signature of an electorate, which is still that of a distribution over (or again, if you like, multiset of) functions from the set of candidates to [0,1]; now, voters v∈V - who still have the type signature of a single function fron the candidate set to [0,1] - vote for lottery-candidates in ΔC based on the voter’s expected utility Ec∼Cv(c).
Such strength as we require lottery-lotteries to have will come at a cost. When we passed from classical voting theory to sortitive voting theory, we lost determinacy and discarded regularity. How much will Mathematics demand of us in weakening of properties of candidate-lottery-lotteries from candidate-lotteries, if we want for lottery-lotteries to be an even-handed function for social-preference aggregation? And can we claw anything back?
The first thing to go is the assumption of ordinality. As in the linked illustrative example of a 3-person electorate points out, ordinal preferences are no longer enough to tell us what lotteries should go in a desirable lottery-lottery. Also, if we tell our voters to rate candidates cardinally, we don’t want to have the failure mode where a voter getting an expected utility of 0 means anything other than that their least-favorite candidate is a guaranteed outcome—it should be very hard to get an expected utility of exactly 0. So we’ll replace it with the following slightly weaker regularity condition:
Weak ordinality:[16] Let V∈ΔVC be a voterbase. Then every v∈V is injective—that is, for all c≠d∈C, v(c)≠v(d), so that each voter could in principle rank all candidates linearly.
Next, after Scott, we choose to overtax the Condorcet criterion to empower dupe-resistance:
Lottery-Condorcet: Let V∈ΔVC be an electorate. If for some c∈C and all D∈ΔC, P(v(c)>Ed∼Dv(d))>12 whenever d≠(1c), a lottery-Condorcet f satisfies fC(V)(c)=1.
This is a stronger precondition than the classical Condorcet condition, checks for which will thus come up less often and have less of an effect on outcomes: now the putative lottery-Condorcet-favored candidate has to beat every single dupe-trenchcoat[17] on top of all the real candidates before it can be crowned Obvious Correct Champion.
Dupe-trenchcoat-resistance: Let D∈ΔC be a lottery-candidate, and assume all voters v∈V∈ΔVC vote for candidates in C=C∪{D} based on Ec∼Cv(c), where the real candidates can be interpreted as being given by distributions that assign probability 1 to a single candidate. A dupe-trenchcoat-resistant voting system f is one for which the outcome satisfies fC∪{D}(V)(c)=fC(V)(c) for all c∈C.
This is a more powerful condition to rely on always being true than the classical dupe-resistance criterion: now the voting system must be able to ignore all dupe-trenchcoats on top of still needing to be able to ignore any dupes of single candidates. Indeed, as the original writeup remarks, passing to a maximal lottery-lottery is just like adding all possible lotteries over candidates to the set of candidates,[18] and then running a maximal lottery over the result. For a dupe-trenchcoat-resistant voting system, it doesn’t matter how many suspiciously-trenchcoated dupe-lotteries enter the election—the outcome of the election remains invariant.
Finally, we give up the game-theoretic/strategyproof definition of dominance, which was as follows:[19]
But why? Why would we accept such an act, and how could we ever choose something to replace so elegant and so innately appealing as the game-theoretic definition of dominance? The problem lies in the kinds of tradeoffs the strategyproof definition would endorse making, which make perfect sense from a game-theoretic perspective but which would be a terrible idea to have a voting system endorse. To see this, let C={A,B,C}, and let V∈ΔVC be given by 0.51:[A:0.6,B:0.55,C:0],0.48:[A:0,B:0.01,C:1],0.01:[A:0,B:0.55,C:1]. Then under a game-theoretic definition of dominance, picking candidate A with probability 1 would be a maximal lottery (and any lottery-lottery that reduced to it would also be maximal).
But this seems absurd! If we want the dominance of one lottery-lottery over another to mean that we should prefer that the voterbase get the first lottery-lottery outcome over the second, then it seems desirable for that that sense of electoral dominance to be some relatively mathematically simple one which happens to be both utilitarian—preferring outcomes where everyone is on the whole cardinally better off, especially if that’s true for each voter—and egalitarian—dispreferring outcomes where anyone is much worse-off, especially if it’s cheap in overall cardinal utility to make the worst voter less badly off. Manifestly, the game-theoretic definition of dominance, while elegant and inherently strategyproof, does not do these things very well if at all. If we’re lucky, there might even be some natural scoring rule that marks the function of voters’ expected utilities that our notion of dominance relies on as optimal for proportional representation.[20] Luckily for us, just such a function exists—the geometric mean!
Though perhaps all of this is a just-so story, shadows on the wall cast by the real reason we should find geometric means so compelling—the fact that it characterizes the Nash solution to cooperative bargaining as generalized to players of differing weights, which itself is an optimal-ish way to mix utility functions with differing weights with some mathematically nice properties reminiscent of some of the ones we’ve defined and cared about here; even if a lottery-lottery looks like it’s not strategyproof on object level, we can appeal to some embedding of this voting problem into the cooperative bargaining problem to allay such worries.[21]
Note that even passing to arithmetic means wouldn’t suffice to rule out this setup:[22] always electing A gives V an expected utility of 0.306, while (for example) always electing B gives V an elected utility of 0.2908. On the other hand, taking geometric means means electing A gives V a geometric expected utility of 0.0, and electing B gives V a geometric expected utility of ∼0.0804; this both makes it more clear which outcome is closer to satisfying our desiderata from above and gives us a measure of precisely how much better it is overall than any maximally unequal distribution of expected utilities.[23]
Accordingly, we’ll give the real definition of lottery-lottery dominance more simply as follows:
Lottery-lottery dominance: Let A,B∈Δ2C be lottery-lotteries, and let V∈ΔVC. Then we say that A dominates B if Gv∈VEa∼∼Av(a)≥Gv∈VEb∼∼Bv(b), and we write A⪰B.[24]
A maximal lottery-lottery is some lottery-lottery M∈Δ2C such that for all L∈Δ2C,M⪰L.
The thing is, we don’t necessary need to steal quite as much strength as that from the Condorcet criterion without giving anything back. Here’s the first major piece of work likely purely my own: the definition of the lottery-Smith criterion.
Lottery-Smith: Divide the candidate-lottery set into candidate-lotteries which should maybe win (frontrunner/lottery-Smith set) and candidate-lotteries who should definitely not win (lottery-dregs), such that any candidate from the frontrunner set beats every candidate from the lottery-dregs set. Then the winner overall should be some distribution over the lottery-Smith set, and in particular, one which is equivalent-in-expectation to one of the distributions over the candidate set we’ve decided is acceptable—we shouldn’t care about telling apart lottery-lotteries that reduce to the same lottery.
Equivalently, let V∈ΔVC be an electorate, and separate ΔC as ΔC=SΔ⊔DΔ, such that ∀S∈SΔ,D∈DΔ, Pv∼V(Es∼Sv(s)≥Ed∼Dv(d))>12. Then a lottery-Smith f satisfies fC(V)(SΔ)=1, that is, fC(V)∈ΔSΔ⊆Δ2C.
In the same way that when we passed from Condorcet to lottery-Condorcet, we got a much stronger precondition, the lottery-Smith condition has a much stronger precondition than the classical Smith condition. Accordingly, checks for it will come up less often and have less of an effect on outcomes: now a potentially lottery-Smith-favored candidate has to belong to the lottery-Smith set—not just the Smith set—before it can be crowned an Obvious Strong Candidate which is worth including in some of the lotteries that make up a lottery-Smith lottery-lottery.
But on the other hand, the lottery-Smith condition also has a weaker precondition than lottery-Condorcet—just as the Smith condition has a weaker precondition to check than the Condorcet condition—and it provides (barely) stronger backup guarantees than lottery-Condorcet—just as the Smith criterion provides (barely) stronger backup guarantees than the Condorcet condition does.
Overall, as far as how the various “candidate-sorting” properties match up against each other:
For how much implied control a property has over electoral outcomes in the average case, assuming it gets to have any:
Smith > Condorcet >>> Lottery-Smith >> Lottery-Condorcet
For what properties logically entail which others:
Lottery-Smith ⪰ [Lottery-Condorcet ⊥ Smith] ⪰ Condorcet
What do lottery-Smith sets have to look like, apart from the final-probabilities condition? Well, good news...
...and bad news...
...and good news for people who like bad news.
We say specifically “utility-maximal” or “utility-nonmaximal” here because the conditions for maximality for candidates and lotteries are now meaningfully different from the maximality condition for lotteries—in particular, utility-maximality is transitive.
As a relevant observation, we should recall that voters vote based on expected utility, and so it can sometimes be more natural to think of elements of ΔC as being the points of a simplex, rather than distinguish lotteries with identical expected outcomes. That way, we can think of our voting system and electorate as giving us a map fV:ΔC→[0,1],L↦Gv∈VEc∼Lv(c) from the simplex to the unit interval; to some extent, all we care about is the preimage of expected utilities in [0,1].[27][28] Ties are a lot more common for lottery-lotteries,[29] and that’s before even considering the rare random tie grounded in some indifference of outcomes which just happen to have the same expected values under the electorate-utility-function—in such a case, two such possible maximal lottery-lotteries need not even share any overlap in the set of partitions of unity realized by subsets.[30] Thankfully, the latter ties depend on the voterbase and are thus likely not generic.[31]
Working over lotteries instead of lottery-lotteries has no equivalent to this extra reduction step, either way we might slice it. We didn’t need either half of this extra step, since maximal lotteries already are their own final probabilities over candidates, and we were totally fine with the maximal lottery being a lottery over several candidates, because there wasn’t an additional level to have to think about or reduce from. But now, not only is it possible for any final distribution over candidates to be realized in entire families of different equally valid ways, but it’s not even guaranteed that the voterbase-favored effective final lottery over candidates is unique! I am almost certain that this is some substantial part of the true reason for the ‘fractal structure due to Nash-equilibrium constraints’ Scott posits.[32] Additionally, because ΔC,Δ2C are compact and can be realized as simplices in Euclidean space, and because the geometric-expected-value evaluation maps eΔ:(ΔC,VC)→[0,1],(L,V)↦Gv∈VEc∼Lv(c), eΔ2:(Δ2C,VC)→[0,1],(L,V)↦Gv∈VEc′∼∼Lv(c′) are continuous for any finite choice of voterbase,[33] we know that by the extreme value theorem, these maps achieve their respective maxima somewhere on their domains. Accordingly, we’re fine to just assume that they exist, but will generally not be unique.
This all presents a strong impression of what a maximal lottery-lottery would have to be—some geometrically-maximizing lottery over candidate-lotteries which are themselves game-theoretically-dominant lotteries over Smith candidates (who are themselves game-theoretically non-dominated).
Maximal Lottery-Lotteries Exist
Better-apprised of the benefits of geometric maximization, a little more effort now results in not only the proof we were hoping for earlier, but a substantially stronger existence result.[34]
This looks like it’s just a patch to the earlier lemma to make it into an existence proof merely by taking the abandonment of ordinality maximally seriously. It is not. It rests heavily on the revised definition of lottery-lottery dominance I considered it justified to pick, given our other philosophical desiderata for voting systems and our abandonment of ordinality and homogeneity.
In particular, we can characterize lottery-lotteries not only by their final probabilities over candidates, but also by how much probability they allocate to each candidate-tagged partition of unity, where each of the candidate-tagged partitions of unity would result in the same geometric-expected payoff as any other.
Now, existence results are near and dear to my heart as a mathematician by inclination and by training both, but I also believe in winning maximally and, on finding checkmate, looking for better. So it’s certainly better than nothing… but it’s still not good enough. We want a more constructive process or some characterizing set of theorems.
And we’ll get to that! Although a natural semi-constructive option should be jumping out at us now,[35] we first need a better understanding of how precisely a maximal lottery-lottery would have to handle joins of populations if we want to be sure that this definition of a maximal lottery-lottery holds up the way we should want it to for joins of populations, and so my definition of the modularity criterion, which replaces both participation and consistency, will have to wait for the next post, as will my construction(s) of a maximal lottery-lottery. However, this seems like a natural point to break the post—some light experimentation suggested that if I make this single post any longer, anyone reading it will rapidly fall off in energy and interest, and that’s not what I want.[36] So I’ll end this post with the pure existence result, and when you feel up to reading about how I define modularity and how I construct maximal lottery-lotteries, click here.
Notation Reference
Throughout this post, I consistently use the following font conventions for mathematical objects:
Lowercase letters in normal Computer Modern (like a,b) are always candidates or voters.
Uppercase letters in normal Computer Modern (like A,B) are always lotteries or voterbases.
Blackboard bold is only ever used as an object for the voterspace VC. It also gets reserved for use as the expected value symbol E and the geometric mean symbol G.
Fraktur is only ever used for the candidate set C and the dregs set D. I would also have used it for the Smith set S, but \frak{S} is famously bad. I thought it was a G for years until grad school because it used to be standard for the symmetry group on n letters. Seriously, just look at it: S.
The calligraphic font is only ever used for lottery-lotteries (like A,B) and for the Smith set S.
Here’s a guide to the standard mathematical notation I use extensively here:
Morphism/Hom-sets - Hom(A,B) - the set of functions from A to B
Disjoint union - A⊔B - the union of the sets A and B, which are either implicitly assumed to be disjoint or else elements in the union are tagged with their set of origin; sometimes used to implicitly partition another set (e.g. S=A⊔B⊔C; C=S⊔D (the partition of the candidate set into Smith and dregs))
Sampling/drawing from a probability distribution - x∼X - x is a random draw from X
Sampling/drawing from a lottery-lottery - X∼X; x∼∼X - X is a randomly drawn lottery from X; x is a random draw from a randomly drawn lottery from X
Probability notation - Px∼Xe(x) - the probability of e(x) happening, where x is drawn from X
Expected value notation - Ex∼Xf(x) - the expected value of f(x), where x is drawn from X
Geometric mean notation - Gx∈Xf(x) - the geometric mean of the values of f(x), where x ranges over X
Weighted geometric mean notation - Gp(x,y):=xpy1−p - the p-weighted geometric mean of x,y
Candidate sets - c∈C - an set of arbitrary elements (candidates) and possibly lotteries on those candidates
Smith sets and dregs - C=S⊔D - the candidate set is partitionable into a Smith set (guaranteed to be nonempty) and a dregs set (about which there are no guarantees)
Candidate lotteries - ΔC - the set of candidate-tagged partitions of unity; the set of probability distributions of the set of candidates
Utility-functions - VC≅Hom(C,[0,1]) - the set of functions from the set of candidates to the unit interval[37]
Voterbases/electorates - V∈ΔVC - vote shares (or probability distributions) over the set of utility functions on candidates
Candidate lottery-lotteries/distribution-distributions - Δ2C - the set of candidate-tagged-partition-of-unity-tagged partitions of unity; the set of probability distributions of probability distributions of the set of candidates[38]
Domination - A⪰B; A⪰B - object A dominates object B; as an abuse of notation, property A logically entails property B.
The quick version, which I detail slightly more at the end of Lottery Outcomes above and which Scott writes up in Maximal Lotteries, is that if you allow for the possibility of lottery outcomes (which don’t even have to crop up most of the time), you can have voting systems that defy Arrow’s impossibility theorem by being both Condorcet and consistent.
This is fully general advice for reading math papers or anything with lots of abstract quantitative content to it, and is meant for almost literally anyone who hasn’t taken an proof-based course in math. I would provide it almost verbatim to anyone who asked me for my professional advice on how to avoid bouncing off a math paper or how to successfully read one. Every time I neglect it myself when reading math, I end up wasting time and regretting it and cursing my folly as I pick thorns and bramble out of my face. It is good advice; bitter like leafy greens. Please take it.
If you’d rather think of it as a multiset, that’s cool too. That technically violates homogeneity, but we’ll actually have to massively weaken that assumption later on anyway when we start talking about conditions on outcomes over joins of electorates, so it’s really down to taste and which way of thinking about, seeing, and manipulating the underlying objects comes most naturally to you. They’re basically equivalent.
Yes, you read that correctly. A distribution. Like, a probability distribution. And yes, the “determinism” condition introduced immediately afterwards completely obviates this setup—the output distribution is required to assign all of its probability-mass to a single outcome—such that the move to lottery outcomes is extremely telegraphed.
Scott doesn’t give these properties names in his writeup and I can’t find standard names for them anywhere else. I’ve thus chosen to given them evocative names for myself.
“Whenever we pick a random voter with utility function v from the electorate V, if the probability that v ranks c above d is always >12, no matter the d, that means that a majority of the electorate prefers c to d, and fC(V)(c) therefore assigns weight 1 to candidate c as the Obvious Correct Champion.”
In Scott’s writeup and likely elsewhere, this is called “clone-invariance” (also known as independence of clones), which is actually a slightly weaker form of the criterion found in Arrow’s impossibility theorem, independence of irrelevant alternatives (IIA). Cousin-properties also weaker than IIA include local independence (deleting the winner and deleting the last-place loser must each leave the other candidates’ relative finishes unchanged), independence of worst candidates (deleting the n worst candidates must leave the other candidates’ finishes unchanged), and Smith independence (if for some proper subset of the candidates (the Frontrunners), every candidate in the set wins all its head-to-head matchups against every candidate outside the set (the Dregs), then deleting all of the Dregs must leave the Frontrunners’ finishes unchanged).
You know, the same way that toruses would prefer to live in R4 so they can lie everywhere-locally flat, and how all nitrogen’s favorite home in chemical-space is as nice stable dinitrogen, and how rays of light really do want to move in straight lines, honest, it’s just all this rest mass hanging around distorting the metric.
In one very austere sense, this is all that need be said in this entire post: change the dominance criterion from game-theoretic to Nash-bargaining, as realized by a geometric mean. The rest is careful commentary.
Unfortunately, the lottery-Condorcet condition is way weaker than the ordinary Condorcet criterion, so unlike for maximal lotteries, we’re likely going to have to break out the weird-shaped dice and the spinners and the quantum-random bits and such.
Strictly speaking, this actually describes something determined by the weight value p - that is, q=1−pp, which ranges over 0≤q<∞ when 0<p≤1. This way, though, we don’t have to even start relitigating (in the prose commentary!) whether or not it makes sense to talk about arbitrarily small positive numbers.
Like, for instance, basing anything at all off of the last digit of the number of votes cast for candidate c.
Why the factors of 2 in the exponents under the square root? Because this needs to be a straightforward generalization of the unweighted geometric mean. Try it yourself for p=12.
Unless you like nonprincipal ultrafilters way more than I do on any given day, anyway. Most of the time, I find them spooky, and I generally require extra caffeine before I’m willing to deal with them.
Seriously, go read The Lottery in Babylon. It’s like four pages long, very fun, and exquisitely thematically relevant to us here. In fact, it would make an excellent palate cleanser for between the two posts.
We should treat this as a regularity condition on the voters and the candidates both, given that it implies that a voter won’t/can’t have utility 0 on more than one candidate. It’s not that strong a condition—if a significant number of voters maximally hate the entire candidate set, that implies problems about the candidate set that no clever voting system can fix.
SCENE: Daytime, exterior. Standing in a row are the candidates for the upcoming election for World Ruler: Alice, Bob, Claire, a twenty-foot-tall thin and misshapen figure in a large trench-coat parts of which are partially transparent, Daniel, and Eve. They have come before The World Electoral Commission, which has controversially banned lottery-duplicate candidates, and which has gathered these notables to question them on suspicion of an illicit lottery-duplicate candidate having managed to register anyway. Tension mounts as the questioning enters its second hour—they’re in for a long day.
In open defiance of the overthrown World Electoral Commission’s ancien regime.
Gentlemen, ladies, both, and neither, restrain your shocked gasps, I beg you!
We might also want for whatever the function on expected voter utilities to be effectively computable, given those utilities, and also compositional over joins of electorates. Unlike the more important properties listed in the text proper, the arithmetic mean definition would satisfy both of these and the game-theoretic definition only sometimes satisfies the latter.
Here’s a sketch of yet another way of convincing yourself that we don’t need to worry too much about abandoning strategyproofness as dominance criterion. It’ll require you to think of our voting systems in the frankly pseudohomogeneous case—i.e., as being about a multiset of utility functions—and it also won’t make sense until you’ve read about modularity in the next post, but if you don’t want to go read that, it’ll be fine to think of a modular voting system as a voting system that’s almost consistent in a way that plays nicely with joining populations with weighted geometric means.
The key things to notice are that a voter’s preferences over lotteries are entirely determined by their preferences over candidates, that they assign distinct nonnegative utilities to each candidate, and thus that no clever voting plan gives any individual voter more control over outcomes, after the geometric averaging, than any other. The first two mean that there’s at most a single lottery that any voter has expected value constant-0 on: the one assigning probability 1 to whichever candidate that voter likes least. In particular, a voter can’t rig their utility function so as to claim to assign utility 0 to almost all lotteries, as would be an obvious strategy for hijacking an electorate, and no voter has more control over outcomes than any other, as we’ll see more explicitly when I sketch a construction of maximal lottery lotteries. In some sense, none of the sharply dominating voting strategies you could implement for rigging an election over candidate-lotteries—even assuming the election takes voters’ utility functions into account—work for dominating voting in a maximal lottery-lottery, or even a modular lottery-lottery, because they’re not even in the strategy set available to the voters. As a direct result, the only plays available to individual voters are the “fair” ones implying linearly self-consistent things about their own preferences, so that every voter has the same amount of steering power over outcomes.
And why should it? It doesn’t characterize Nash bargaining. Geometric means do. End of.
By contrast, the lottery-outcome given by random-dictatorship of subpopulations, (0.51A,0.49C), has geometric-expected utility (0.51⋅0.6+0.49⋅0)0.51⋅(0.51⋅0+0.49⋅1)0.49 =0.3060.51⋅0.490.49∼0.3854 (and arithmetic-expected utility ∼0.411), a massive improvement over either one!
Yes, I’m still going to write ⪰ instead of ≥. No, I don’t care that this is now a total order induced by the canonical pullback of the canonical order on the reals. I want a symbol specifically for dominance-and-maybe-not-literally-greater-than, and I’m the one writing the post.
A more formal proof:
Generally, the lottery-Smith set will be some convex subset of ΔS. In fact, for ⊆c taken to mean “is a convex subset of, within the appropriate domain”, S⊆cC, SΔ⊆cΔS⊆cΔC.
We use the geometric mean over voters here (weighted by vote-share) instead of taking the arithmetic expected value sampling over the voterbase because later on, we’ll use geometric maximization extensively for its compositional properties and fairness guarantees. I decided to include it in the definition here rather than keep it a mystery until after I talk about geometric maximization for the sake of clarity. If you don’t know yet what’s going on with it, either don’t worry about it for now or just skip to the section about geometric maximization and come back.
Also: these preimage sets might be submanifolds of the simplex? I’m not quite sure to what extent the submersion theorem applies here.
Take the example (for the maximally muddled 3-voter electorate) of the lottery-lotteries (13(1A),13(1B),13(1C)) and (16(1A),16(1B),16(1C),12(13A,13B,13C)) - and the entire rest of the family (p3(1A),p3(1B),p3(1C),3−3p(13A,13B,13C)), for 0≤p≤1, including (1(13A,13B,13C)). All of these are equivalent to the same maximal lottery-lottery—the one for which U={(13A,13B,13C)} - and these aren’t even all of the lotteries that can show up for the maximal lottery-lotteries for this voterbase.
Here’s a very simple example of this kind of failure happening maximally badly in action: take a candidate set of just three candidates, A,B,C, and a voterbase of just two voters, v,w, such that v(A)=1,v(B)=0,v(C)=12 and w(A)=0,w(B)=1,w(C)=12. That is - C is a compromise candidate. Then any of the family of lotteries given by (p(12A,12B),(1−p)(1C)),0≤p≤1 all tie each other for expected utility, despite giving any candidate probability to C you like! And it should—for all intents and purposes, C is indistinguishable from (12A,12B).
I can just hear the prop-collision noises of the resulting z-fighting.
The rest is covered by the compositionality you get from the use of weighted geometric means, and the equivalence between the “mash them all together immediately” and “assembly by means of full binary tree” approaches that I describe in the next post.
To see this for the lottery case, note that for any voter v∈V, Ec∈Lv(c) is continuous over L∈ΔC, and then we take the geometric mean of a countable number of continuous functions, the result of which is still continuous under the appropriate metric. (If you don’t like that, pass to finite instead. Its support is also limited to parts of ΔC where no voter has expected utility 0.) The lottery-lottery case is identical, except that we first reduce to a lottery under the usual map L↦Pc∼∼L(c=ci). All of this probably also works for the case of countably many voters, but I don’t know how to prove that, and anyway we’ll see later that we can pretty much just pass to the case where |V|<∞. This kind of approach also lets us trivially prove things like “every continuous map from lottery-lotteries to lottery-lotteries has at least one fixed point; this includes every continuous map which is weakly increasing in geometric expectation of utility”, which you may spot as yet another avenue I might have picked to prove the existence of maximal lottery-lotteries. I don’t actually know if that proof sketch works, though.Δ2C would, at the absolute least, have to be endowed with an appropriately-motivated topology.
I’m not going to lie, when I set out to prove something like this theorem, I wasn’t expecting to prove anything more than that a maximal lottery-lottery would have to be lottery-Smith. I totally underestimated this approach, judging that no way could just patching the lemma from earlier work. Turns out: full utility data is really powerful.
I’ve described it elsewhere in the post and so won’t here. If that option isn’t jumping out at you… I want to say “you should think about it more carefully” but really I don’t know how hard it is to see the answer. I think it’s worth taking a literal clock-time five minutes to see if you can spot it; and if you still can’t, fair play—but better if you make your own guesses first.
Water/a drink, caffeine/meds, maybe a light snack, definitely a stretch and a pace.[15] That’s my recommendation, anyway. The second post will almost certainly still be there after you’re done and you’ll be better suited to process it.
Scott doesn’t see a need for the subscript here and thus suppresses it in his notation.
Scott notates this one more literally as ΔΔC.