Polymath-style attack on the Parliamentary Model for moral uncertainty
Thanks to ESrogs, Stefan_Schubert, and the Effective Altruism summit for the discussion that led to this post!
This post is to test out Polymath-style collaboration on LW. The problem we’ve chosen to try is formalizing and analyzing Bostrom and Ord’s “Parliamentary Model” for dealing with moral uncertainty.
I’ll first review the Parliamentary Model, then give some of Polymath’s style suggestions, and finally suggest some directions that the conversation could take.
The Parliamentary Model
The Parliamentary Model is an under-specified method of dealing with moral uncertainty, proposed in 2009 by Nick Bostrom and Toby Ord. Reposting Nick’s summary from Overcoming Bias:
Suppose that you have a set of mutually exclusive moral theories, and that you assign each of these some probability. Now imagine that each of these theories gets to send some number of delegates to The Parliament. The number of delegates each theory gets to send is proportional to the probability of the theory. Then the delegates bargain with one another for support on various issues; and the Parliament reaches a decision by the delegates voting. What you should do is act according to the decisions of this imaginary Parliament. (Actually, we use an extra trick here: we imagine that the delegates act as if the Parliament’s decision were a stochastic variable such that the probability of the Parliament taking action A is proportional to the fraction of votes for A. This has the effect of eliminating the artificial 50% threshold that otherwise gives a majority bloc absolute power. Yet – unbeknownst to the delegates – the Parliament always takes whatever action got the most votes: this way we avoid paying the cost of the randomization!)
The idea here is that moral theories get more influence the more probable they are; yet even a relatively weak theory can still get its way on some issues that the theory think are extremely important by sacrificing its influence on other issues that other theories deem more important. For example, suppose you assign 10% probability to total utilitarianism and 90% to moral egoism (just to illustrate the principle). Then the Parliament would mostly take actions that maximize egoistic satisfaction; however it would make some concessions to utilitarianism on issues that utilitarianism thinks is especially important. In this example, the person might donate some portion of their income to existential risks research and otherwise live completely selfishly.
I think there might be wisdom in this model. It avoids the dangerous and unstable extremism that would result from letting one’s current favorite moral theory completely dictate action, while still allowing the aggressive pursuit of some non-commonsensical high-leverage strategies so long as they don’t infringe too much on what other major moral theories deem centrally important.
In a comment, Bostrom continues:
there are a number of known issues with various voting systems, and this is the reason I say our model is imprecise and under-determined. But we have some quite substantial intuitions and insights into how actual parliaments work so it is not a complete black box. For example, we can see that, other things equal, views that have more delegates tend to exert greater influence on the outcome, etc. There are some features of actual parliaments that we want to postulate away. The fake randomization step is one postulate. We also think we want to stipulate that the imaginary parliamentarians should not engage in blackmail etc. but we don’t have a full specification of this. Also, we have not defined the rule by which the agenda is set. So it is far from a complete formal model.
It’s an interesting idea, but clearly there are a lot of details to work out. Can we formally specify the kinds of negotiation that delegates can engage in? What about blackmail or prisoners’ dilemmas between delegates? It what ways does this proposed method outperform other ways of dealing with moral uncertainty?
I was discussing this with ESRogs and Stefan_Schubert at the Effective Altruism summit, and we thought it might be fun to throw the question open to LessWrong. In particular, we thought it’d be a good test problem for a Polymath-project-style approach.
How to Polymath
The Polymath comment style suggestions are not so different from LW’s, but numbers 5 and 6 are particularly important. In essence, they point out that the idea of a Polymath project is to split up the work into minimal chunks among participants, and to get most of the thinking to occur in comment threads. This is as opposed to a process in which one community member goes off for a week, meditates deeply on the problem, and produces a complete solution by themselves. Polymath rules 5 and 6 are instructive:
5. If you are planning to think about some aspect of the problem offline for an extended length of time, let the rest of us know. A polymath project is supposed to be more than the sum of its individual contributors; the insights that you have are supposed to be shared amongst all of us, not kept in isolation until you have resolved all the difficulties by yourself. It will undoubtedly be the case, especially in the later stages of a polymath project, that the best way to achieve progress is for one of the participants to do some deep thought or extensive computation away from the blog, but to keep in the spirit of the polymath project, it would be good if you could let us know that you are doing this, and to update us on whatever progress you make (or fail to make). It may well be that another participant may have a suggestion that could save you some effort.
6. An ideal polymath research comment should represent a “quantum of progress”. On the one hand, it should contain a non-trivial new insight (which can include negative insights, such as pointing out that a particular approach to the problem has some specific difficulty), but on the other hand it should not be a complex piece of mathematics that the other participants will have trouble absorbing. (This principle underlies many of the preceding guidelines.) Basically, once your thought processes reach a point where one could efficiently hand the baton on to another participant, that would be a good time to describe what you’ve just realised on the blog.
It seems to us as well that an important part of the Polymath style is to have fun together and to use the principle of charity liberally, so as to create a space in which people can safely be wrong, point out flaws, and build up a better picture together.
Our test project
If you’re still reading, then I hope you’re interested in giving this a try. The overall goal is to clarify and formalize the Parliamentary Model, and to analyze its strengths and weaknesses relative to other ways of dealing with moral uncertainty. Here are the three most promising questions we came up with:
What properties would be desirable for the model to have (e.g. Pareto efficiency)?
What should the exact mechanism for negotiation among delegates?
Are there other models that are provably dominated by some nice formalization of the Parliamentary Model?
The original OB post had a couple of comments that I thought were worth reproducing here, in case they spark discussion, so I’ve posted them.
Finally, if you have meta-level comments on the project as a whole instead of Polymath-style comments that aim to clarify or solve the problem, please reply in the meta-comments thread.
- Does any thorough discussion of moral parliaments exist? by 6 Sep 2019 15:33 UTC; 36 points) (EA Forum;
- 10 Feb 2019 2:00 UTC; 15 points) 's comment on The Case for a Bigger Audience by (
- 14 Oct 2014 17:29 UTC; 7 points) 's comment on Open thread, Oct. 13 - Oct. 19, 2014 by (
- 15 Sep 2019 11:54 UTC; 3 points) 's comment on Are we living at the most influential time in history? by (EA Forum;
Consider the following degenerate case: there is only one decision to be made, and your competing theories assess it as follows.
Theory 1: option A is vastly worse than option B.
Theory 2: option A is just a tiny bit better than option B.
And suppose you find theory 2 just slightly more probable than theory 1.
Then it seems like any parliamentary model is going to say that theory 2 wins, and you choose option A. That seems like a bad outcome.
Accordingly, I suggest that to arrive at a workable parliamentary model we need to do at least one of the following:
Disallow degenerate cases of this kind. (Seems wrong; e.g., suppose you have an important decision to make on your deathbed.)
Bite the bullet and say that in the situation above you really are going to choose A over B. (Seems pretty terrible.)
Take into account how strongly the delegates feel about the decision, in such a way that you’d choose B in this situation. (Handwavily it feels as if any way of doing this is going to constrain how much “tactical” voting the delegates can engage in.)
As you might gather, I find the last option the most promising.
Great example. As an alternative to your three options (or maybe this falls under your first bullet), maybe negotiation should happen behind a veil of ignorance about what decisions will actually need to be made; the delegates would arrive at a decision function for all possible decisions.
Your example does make me nervous, though, on the behalf of delegates who don’t have much to negotiate with. Maybe (as badger says) cardinal information does need to come into it.
Yes, I think we need something like this veil of ignorance approach.
In a paper (preprint) with Ord and MacAskill we prove that for similar procedures, you end up with cyclical preferences across choice situations if you try to decide after you know the choice situation. The parliamentary model isn’t quite within the scope of the proof, but I think more or less the same proof works. I’ll try to sketch it.
Suppose:
We have equal credence in Theory 1, Theory 2, and Theory 3
Theory 1 prefers A > B > C
Theory 2 prefers B > C > A
Theory 3 prefers C > A > B
Then in a decision between A and B there is no scope for negotiation, so as two of the theories prefer A the parliament will. Similarly in a choice between B and C the parliament will prefer B, and in a choice between C and A the parliament will prefer A.
This seems really similar to the problem Knightian uncertainty attempts to fix.
I think So8res’s solution is essentially your option 3, with the strength of the disagreements being taken into account in the utility function, and then once you really have everything you care about accounted for, then the best choice is the standard one.
I agree that some cardinal information needs to enter in the model to generate compromise. The question is whether we can map all theories onto the same utility scale or whether each agent gets their own scale. If we put everything on the same scale, it looks like we’re doing meta-utilitarianism. If each agent gets their own scale, compromise still makes sense without meta-value judgments.
Two outcomes is too degenerate if agents get their own scales, so suppose A, B, and C were options, theory 1 has ordinal preferences B > C > A, and theory 2 has preferences A > C > B. Depending on how much of a compromise C is for each agent, the outcome could vary between
choosing C (say if C is 99% as good as the ideal for each agent),
a 50⁄50 lottery over A and B (if C is only 1% better than the worst for each), or
some other lottery (for instance, 1 thinks C achieves 90% of B and 2 thinks C achieves 40% of A. Then, a lottery with weight 2/3rds on C and 1/3rd on A gives them each 60% of the gain between their best and worst)
A possible (but I admit, quite ugly) workaround: whenever there are very few decisions to be made introduce dummy bills that would not be actually carried out. MPs wouldn’t know about their existence. In this case Theory 1 might be able to negotiate their way into getting B.
My reading of the problem is that a satisfactory Parliamentary Model should:
Represent moral theories as delegates with preferences over adopted policies.
Allow delegates to stand-up for their theories and bargain over the final outcome, extracting concessions on vital points while letting others policies slide.
Restrict delegates’ use of dirty tricks or deceit.
Since bargaining in good faith appears to be the core feature, my mind immediately goes to models of bargaining under complete information rather than voting. What are the pros and cons of starting with the Nash bargaining solution as implemented by an alternating offer game?
The two obvious issues are how to translate delegate’s preferences into utilities and what the disagreement point is. Assuming a utility function is fairly mild if the delegate has preferences over lotteries. Plus,there’s no utility comparison problem even though you need cardinal utilities. The lack of a natural disagreement point is trickier. What intuitions might be lost going this route?
I think there’s a fairly natural disagreement point here: the outcome with no trade, which is just a randomisation of the top options of the different theories, with probability according to the credence in that theory.
One possibility to progress is to analyse what happens here in the two-theory case, perhaps starting with some worked examples.
Alright, a credence-weighted randomization between ideals and then bargaining on equal footing from there makes sense. I was imagining the parliament starting from scratch.
Another alternative would be to use a hypothetical disagreement point corresponding to the worst utility for each theory and giving higher credence theories more bargaining power. Or more bargaining power from a typical person’s life (the outcome can’t be worse for any theory than a policy of being kind to your family, giving to socially-motivated causes, cheating on your taxes a little, telling white lies, and not murdering).
In the set-up we’re given the description of what happens without any trade—I don’t quite see how we can justify using anything else as a defection point.
I think the the Nash bargaining solution should be pretty good if there are only two members of the parliament, but it’s not clear how to scale up to a larger parliament.
For the NBS with more than two agents, you just maximize the product of everyone’s gain in utility over the disagreement point. For Kalai-Smodorinsky, you continue to equate the ratios of gains, i.e. picking the point on the Pareto frontier on the line between the disagreement point and vector of ideal utilities.
Agents could be given more bargaining power by giving them different exponents in the Nash product.
Giving them different exponents in the Nash product has some appeal, except that it does seem like NBS without modification is correct in the two-delegate case (where the weight assigned to the different theories is captured properly by the fact that the defection point is more closely aligned with the view of the theory with more weight). If we don’t think that’s right in the two-delegate case we should have some account of why not.
The issue is when we should tilt outcomes in favor of higher credence theories. Starting from a credence-weighted mixture, I agree theories should have equal bargaining power. Starting from a more neutral disagreement point, like the status quo actions of a typical person, higher credence should entail more power / votes / delegates.
On a quick example, equal bargaining from a credence-weighted mixture tends to favor the lower credence theory compared to weighted bargaining from an equal status quo. If the total feasible set of utilities is {(x,y) | x^2 + y^2 ≤ 1; x,y ≥ 0}, then the NBS starting from (0.9, 0.1) is about (0.95, 0.28) and the NBS starting from (0,0) with theory 1 having nine delegates (i.e. an exponent of nine in the Nash product) and theory 2 having one delegate is (0.98, 0.16).
If the credence-weighted mixture were on the Pareto frontier, both approaches are equivalent.
Update: I now believe I was over-simplifying things. For two delegates I think is correct, but in the parliamentary model that corresponds to giving the theories equal credence. As credences vary so do the number of delegates. Maximising the Nash product over all delegates is equivalent to maximising a product where they have different exponents (exponents in proportion to the number of delegates).
In order to get a better handle on the problem, I’d like to try walking through the mechanics of a how a vote by moral parliament might work. I don’t claim to be doing anything new here, I just want to describe the parliament in more detail to make sure I understand it, and so that it’s easier to reason about.
Here’s the setup I have in mind:
let’s suppose we’ve already allocated delegates to moral theories, and we’ve ended up with 100 members of parliament, MP_1 through MP_100
these MP’s will vote on 10 bills B_1 through B_10 that will each either pass or fail by majority vote
each MP M_m has a utility score for each bill B_b passing U_m,b (and assigns zero utility to the bill failing, so if they’d rather the bill fail, U_m,b is negative)
the votes will take place on each bill in order from B_1 to B_10, and this order is known to all MP’s
all MP’s know each other’s utility scores
Each MP wants to maximize the utility of the results according to their own scores, and they can engage in negotiation before the voting starts to accomplish this.
Does this seem to others like a reasonable description of how the parliamentary vote might work? Any suggestions for improvements to the description?
If others agree that this description is unobjectionable, I’d like to move on to discussing negotiating strategies the MP’s might use, the properties these strategies might have, and whether there are restrictions that might be useful to place on negotiating strategies. But I’ll wait to see if others think I’m missing any important considerations first.
This looks reasonable to analyse (although I’d be interested in analysing other forms too).
I’d be tempted to start with a simpler example to get complete analysis. Perhaps 2 bills and 2 MPs. If that’s easy, move to 3 MPs.
It seems like votes should be considered simultaneously to avoid complex alliances of the form: I will vote on B4 in the direction you like if you vote on B3 in the direction I like, but this is only possible in one direction WRT time. Having such an ordering and resulting negotiations means that some agents have an incentive to bargain for moving the location of a bill. It seems better to be able to make all such Bx vote for By vote trades. I’m not familiar enough with voting models to know the tradeoffs for a simultaneous system though.
An alternative is to say that only one of the votes actually occurs, but which it is will be chosen randomly.
A very quick thought about one type of possible negotiating strategies. A delegate might choose a subset of bills, choose another delegate to approach and offer a usual cake cutting game for two players, when the first delegate divides that subset into two “piles” and allows the second delegate to choose one of them. Then they each would decide how to vote on the bills from their respective “piles” and promise to vote in accordance to each other’s decisions.
However, it is not clear to me how these two choices (marked by asterisks) should work. Also, whether the second candidate should be allowed to reject the offer to play a cake cutting game.
edit: A potential flaw. Suppose we have a bill with two possible voting options A_1 and A_2 (e.g. “yes” and “no”) with no possibility to introduce a new intermediate option. If a option A is supported by a small enough minority (0.75), this minority would never be able to achieve A (even though they wouldn’t know that), and utility difference U_m (A_1) - U_m (A_2) for each m would not matter, only the sign of difference would.
A remark that seems sufficiently distinct to deserve its own comment. At this moment we are only thinking about delegates with “fixed personalities”. Should “personality” of a delegate be “recalculated[1]” after each new agreement/trade [2]? Changes would temporary, only within a context of a given set of bills, they would revert to their original “personalities” after the vote. Maybe this could give results that would be vaguely analogous to smoothing a function? This would allow us to have a kind of “persuasion”.
In the context of my comment above, this could enable taking into account utility differences and not just signs, assuming large differences in utility would usually require large changes (and therefore, usually more than one change) in “personality” to invert the sign of it. I admit that this is very handwavy.
[1] I do not know what interpolation algorithm should be used
[2] A second remark. Maybe delegates should trade changes in each other’s “personality” rather than votes themselves, i.e. instead of promising to vote on bills in accordance to some binding agreement, they would promise to perform a minimal possible non-ad-hoc change [3] to their personalities that would make them vote that way? However, this could create slippery slopes, similar to those mentioned here.
[3] This is probably a hard problem
It seems to me that the less personal MPs are, and the fewer opportunities we allow for anthropomorphic persuasion between them (through appeals such as issue framing, pleading, signaling loyalty to a coalition, ingratiation, defamation, challenges to the MPs status, deceit (e.g. unreliable statements by MPs about their private info relevant to probable consequences of acts resulting from the passage of bills)), then all the more we will encapsulate away the hard problems of moral reasoning within the MPs.
Even persuasive mechanisms more amenable to formalization—like agreements between MPs to reallocate their computational resources, or like risk-sharing agreements between MPs based on their expectations that they might lose future influence in the parliament if the agent changes its assignment of probabilities to the MPs’ moral correctness based on its observation of decision consequences—even these sound to me, in the absence of reasons why they should appear in a theory of how to act given a distribution over self-contained moral theories, like complications that will impede crisp mathematical reasoning, introduced mainly for their similarity to the mechanisms that function in real human parliaments.
Or am I off base, and your scare quotes around “personality” mean that you’re talking about something else? Because what I’m picturing is basically someone building cognitive machinery for emotions, concepts, habits and styles of thinking, et cetera, on top of moral theories.
Well, I agree that I chose words badly and then didn’t explain the intended meaning, continued to speak in metaphors (my writing skills are seriously lacking). What I called “personality” of a delegate was a function that assigns a utility score for any given state of the world (at the beginning they are determined by moral theories). In my first post I thought about these utility function as constants and stayed that way throughout negotiation process (it was my impression that ESRogs 3rd assumption implicitly says basically the same thing), maybe accepting some binding agreements if they help to increase the expected utility (these agreements are not treated as a part of utility function, they are ad-hoc).
On the other hand, what if we drop the assumption that these utility functions stay constant? What if, e.g. when two delegates meet, instead of exchanging binding agreements to vote in a specific way, they would exchange agreements to self-modify in a specific way that would correspond to those agreements? I.e. suppose a delegate M_1 strongly prefers option O_1,1 to an option O_1,2 on an issue B_1 and slightly prefers O_2,1 to O_2,2 on an issue B_2, whereas a delegate M_2 strongly prefers option O_2,2 to an option O_2,1 on an issue B_2 and slightly prefers O_1,2 to O_1,1 on an issue B_1. Now, M_1 could agree to vote (O_1,1 ;O_2,2) in exchange for a promise that M_2 would vote the same way, and sign a binding agreement. On the other hand, M_1 could agree to self-modify to slightly prefer O_2,2 to O_2,1 in exchange for a promise that M_2 would self-modify to slightly prefer O_1,1 to O_1,2 (both want to self-modify as little as possible, however any modification that is not ad-hoc would probably affect utility function at more than one point (?). Self-modifying in this case is restricted (only utility function is modified), therefore maybe it wouldn’t require heavy machinery (I am not sure), besides, all utility functions ultimately belong to the same persons). These self-modifications are not binding agreements, delegates are allowed to further self-modify their “personalities”(i.e. utility functions) in another exchange.
Now, this idea vaguely reminds me a smoothing over the space of all possible utility functions. Metaphorically, this looks as if delegates were “persuaded” to change their “personalities”, their “opinions about things”(i.e. utility functions) by an “argument” (i.e. exchange).
I would guess these self-modifying delegates should be used as dummy variables during a finite negotiation process. After the vote, delegates would revert to their original utility functions.
Remember there’s no such thing as zero utility. You can assign an arbitrarily bad value to failing to resolve, but it seems a bit arbitrary.
Hmm. What I was intending to do there was capture the idea that a bill failing to pass is the default state, and I’m only interested in the difference between a bill passing and a bill failing. So the utility score of a bill passing is supposed to represent the difference between it getting passed vs nothing happening.
Does that make sense? Am I just using utility terminology in a confusing way?
Pinning the utility of a failed bill to 0 for all agents gets rid of some free parameters in the model, but it’s not clear to me that it’s the complete way to do so (you still have enough free parameters that you could do more).
What do we get from using the utility per bill framework?
We enforce that the combined desirability of a bill portfolio can only depend on the sum of the individual desirabilities of the bills.
We allow MPs to price gambles between bills.
It’s not clear to me that the second is going to be useful (do they have access to a source of randomness and binding commitments?), and it’s not clear to me that the first is a requirement we actually want to impose. Suppose B1 is something like “cows are people” and B2 is something like “we shouldn’t eat people.” A MP who is against eating humans but for eating cows will flip their opinion on B2 based on the (expected) outcome of B1.
So then it seems like we should assign values to portfolios (i.e. bitstrings of whether or not bills passed), and if we don’t need probabilistic interpretations then we should deal with ordinal rankings of those bitstrings that allow indifference, which would look like (01>11>10=00). A perhaps inaccessible way to talk about those rankings is sets of permutations of bitstrings (the previous ranking is <(01,11,10,00),(01,11,00,10)>).
That’s a good suggestion about the allowing the MP’s assign utilities to portfolios. I went with the per bill framework because I thought it was simpler, and was trying to find the simplest formalization I could that would capture the interesting parts of the parliamentary model.
But perhaps dependence of bills on each other (or in the real world of actions that one’s moral parliament might take on each other) might be a key feature?
It might be interesting to see if we can analyze both models.
In Ideal Advisor Theories and Personal CEV, my co-author and I describe a particular (but still imprecisely specified) version of the parliamentary approach:
We then very briefly argue that this kind of approach can overcome some objections to parliamentary models (and similar theories) made by philosopher David Sobel.
The paper is short and non-technical, but still manages to summarize some concerns that we’ll likely want a formalized parliamentary model to overcome or sidestep.
META THREAD: what do you think about this project? About Polymath on LW?
I think LW has the right kind of community for Polymath, and I think it’s a good idea to give it a try.
Harras:
It looks like this problem is assuming that Parliament uses plurality voting with more than 2 options. It seems like it shouldn’t be a problem if all votes involve only 2 options (an up-or-down vote on a single bill). If we want the rules to allow votes between more than 2 options, it seems fixable by using a different voting system such as approval voting.
Given delegates with a certain type of amnesia (i.e. they should not remember to have voted on an issue before, although they might have to remember some binding agreements (I am not sure about that)), we could replace a plurality vote with an elimination runoff, where at each step of elimination delegates think that this is the only vote on that issue (which is thought to be affected by randomization) and they are not allowed to introduce new options.
Well, this system might have its own disadvantages, possibly similar to these (however, at each step negotiations are allowed), although delegates wouldn’t know how to actually game it.
We discussed this issue at the two MIRIx Boston workshops. A big problem with parliamentary models which we were unable to solve, was what we’ve been calling ensemble stability. The issue is this: suppose your AI’s value system is made from a collection of value systems in a voting-like system, is constructing a successor, more powerful AI, and is considering constructing the successor so that it represents only a subset of the original value systems. Each value system which is represented will be in favor; each value system which is not represented, will be against. In order to keep that from happening, you either need a voting system which somehow reliably never does that (but nothing we tried worked), or a special case for constructing successors, and a working loophole-free definition of that case (which is Hard).
This seems to be almost equivalent to irreversibly forming a majority voting bloc. The only difference is how they interact with the (fake) randomization: by creating a subagent, it effectively (perfectly) correlates all the future random outputs. (In general, I think this will change the outcomes unless agents’ (cardinal) preferences about different decisions are independent).
The randomization trick still potentially helps here: it would be in each representative’s interest to agree not to vote for such proposals, prior to knowing which such proposals will come up and in which order they’re voted on. However, depending on what fraction of its potential value an agent expects to be able to achieve through negotiations, I think that some agents would not sign such an agreement if they know they will have the chance to try to lock their opponents out before they might get locked out.
Actually, there seems to be a more general issue with ordering and incompatible combinations of choices - splitting that into a different comment.
It seems to me that if we’re going to be formalizing the idea of the relative “moral importance” of various courses of action to different moral theories, we’ll end up having to use something like utility functions. It’s unfortunate, then, that deontological rules (which are pretty common) can’t be specified with finite utility functions because of the timelessness issue (i.e., a deontologist who doesn’t lie won’t lie even if doing so would prevent them from being forced to tell ten lies in the future).
Well, the entire idea of the parliamentary approach is predicated on the idea that the parliamentarians have some actions that they consider “more bad” than other actions.
How’s this for a formalization: Our parliament faces a series of decisions d[i]. For any given decision the parliament faces, there are a series of choices d[i][j] that could be made regarding it. (d[i][0], d[i][1], etc.)
Over any given session of a parliament, the parliament faces every decision d[i] and for each decision it faces, makes a choice d[i][j] regarding how to address it. A structure containing all the decisions the parliament faces and a choice for each is a “decision record”. A parliamentarians’ preferences are specified by an ordering of decision records from most preferred to least preferred. The total number of possible decision records is equal to the product of the numbers of choices for each individual decision.
I like the formalization, but it seems to miss a key feature of the parliamentary model. Per Bostrom,
If preferences are only defined by an ordering of possible outcomes, then you would get something like this:
Total Utilitarian := (Donate 100% of income to existential risk reduction and otherwise behave selflessly, Donate 100% to x-risk and behave egoistically, Donate 40% and behave selflessly, Donate 40% and behave egoistically, 0% and selfless, 0% and egoistic)
Egoist := Reverse(Total Utilitarian)
Then what particular reason do we have to expect them to end up compromising at [40% and egoistic], rather than (say) [0% and selfless]? Obviously the total utilitarian would much prefer to donate 40% of their income to x-risk reduction and behave selfishly in interpersonal circumstances than to do the reverse (donate nothing but take time out to help old ladies across the road, etc.). But any system for arriving at the fairer compromise just on the basis of those ordinal preferences over decisions could be manipulated into deciding differently just by introducing [39.9% and egoistic] or [0.1% and selfless] as a bill, or whatever. The cardinal aspect of the total utilitarian’s preference is key to being able to consistently decide what tradeoffs that philosophy would be willing to make.
(NB: I’m aware that I’m being terribly unfair to the object-level moral philosophies of egoism and total utilitarianism, but I hope that can be forgiven along with my terrible notation in service of the broader point)
Edit: gjm puts it better
Can’t we use a hierarchy of ordinal numbers and a different ordinal sum (e.g. maybe something of Conway’s) in our utility calculations?
That is, lying would be infinitely bad, but lying ten times would be infinitely worse.
To avoid the timelessness issue, the parliament could be envisioned as voting on complete courses of action over the foreseeable future, rather than separate votes taken on each action. Then the deontologists’ utility function could return 0 for all unacceptable courses of action and 1 for all acceptable courses of action.
Maybe deontological theory can be formalized as parliamentary fraction that have the only one right option for each decision and always vote for this option and can’t be bargained to change its vote. This formalization have an unfortunate consequence: if some deontological theory have more then the 50% credence, agent will always act on it. But if no deontological theory have more then the 50% fraction, this formalization can be reasonable.
Eliezer Yudkowsky:
The threats problem seems like a specific case of problems that might arise by putting real intelligence in to the agents in the system. Especially if this moral theory was being run on a superintelligent AI, it seems like the agents might be able to come up with all sorts of creative unexpected stuff. And I’m doubtful that creative unexpected stuff would make the parliament’s decisions more isomorphic to the “right answer”.
One way to solve this problem might be to drop any notion of “intelligence” in the delegates and instead specific a deterministic algorithm that any individual delegate follows in deciding which “deals” they accept. Or take the same idea even further and specify a deterministic algorithm for resolving moral uncertainty that is merely inspired by the function of parliaments, in the same sense that the stable marriage problem and algorithms for solving it could have been inspired by the way people decide who to marry.
Eliezer’s notion of a “right answer” sounds appealing, but I’m a little skeptical. In computer science, it’s possible to prove that a particular algorithm, when run, will always achieve the maximal “score” on a criterion it’s attempting to optimize. But in this case, if we could formalize a score we wanted to optimize for, that would be equivalent to solving the problem! That’s not to say this is a bad angle of approach, however… it may be useful to take the idea of a parliament and use it to formalize a scoring system that captures our intuitions about how different moral theories trade off and then maximize this score using whatever method seems to work best. For example waves hands perhaps we could score the total regret of our parliamentarians and minimize that.
Another approach might be to formalize a set of criteria that a good solution to the problem of moral uncertainty should achieve and then set out to design an algorithm that achieves all of these criteria. In other words, making a formal problem description that’s more like that of the stable marriage problem and less like that of the assignment problem.
So one plan of attack on the moral uncertainty problem might be:
Generate a bunch of “problem descriptions” for moral uncertainty that specify a set of criteria to satisfy/optimize.
Figure out which “problem description” best fits our intuitions about how moral uncertainty should be solved.
Find an algorithm that provably solves the problem as specified in its description.
One route towards analysing this would be to identify a unit of currency which was held in roughly equal value by all delegates (at least at the margin), so that we can analyse how much they value other things in terms of this unit of currency—this could lead to market prices for things (?).
Perhaps a natural choice for a currency unit would be something like ‘unit of total say in the parliament’. So for example a 1% chance that things go the way of your theory, applied before whatever else would happen.
I’m not sure if this could even work, just throwing it out there.
The idea of explicit vote-selling is probably the easiest way to have ‘enforceable contracts’ without things getting particularly sticky. (If you have ordered votes and no enforceable contracts, then vote order becomes super important and trading basically breaks apart. But if you have ordered votes and vote sales, then trading is still possible because the votes can’t switch.)
But I don’t think the prices are going to be that interesting- if the vote’s on the edge, then all votes are valuable, but as soon as one vote changes hand the immediate price of all votes drops back to 0. Calculating the value of, say, amassing enough votes to deter any trading on that vote seems like it might add a lot of murkiness without much increased efficiency.
The voting system is set up to avoid these edge effects. From the opening post:
Hm, somehow I failed to notice that. It’s not clear to me that you want to avoid the edge effects, though; delegates might trade away influence on contentious issues (where we have significant moral uncertainty) to double down on settled issues (where we have insignificant moral uncertainty), if the settled issues are sufficiently important. Eliezer’s concern that delegates could threaten to vote ‘no’ on something important would make others desperately buy their votes away from them- unless you have a nonlinearity which makes the delegates secure that a lone filibuster won’t cause trouble.
[edit]On second thought, though, it seems likely to be desirable that delegates / the parliament would behave linearly in the probability of various moral theories. The concern is mostly that this means we’ll end up doing averaging, and nothing much more interesting.
Is there some way to rephrase this without bothering with the parliament analogy at all? For example, how about just having each moral theory assign the available actions a “goodness number” (basically expected utility). Normalize the goodness numbers somehow, then just take the weighted average across moral theories to decide what to do.
If we normalize by dividing each moral theory’s answers by its biggest-magnitude answer, (only closed sets of actions allowed :) ) I think this regenerates the described behavior, though I’m not sure. Obviously this cuts out “human-ish” behavior of parliament members, but I think that’s a feature, since they don’t exist.
There’s a family of approaches here, but it’s not clear that they recreate the same behaviour as the parliament (at least without more arguments about the parliament). Whether they are more or less desirable is a separate question.
Incidentally, the version that you suggest isn’t quite well-defined, since it can be changed by adding a constant to the theory of a function. But that can easily be patched over.
I’ve argued that normalising the variance of the functions is the most natural of these approaches (link to a paper giving the arguments in a social choice context; forthcoming paper with Ord and MacAskill in the moral uncertainty context).
I like the originality of the geometric approach. I don’t think it’s super useful, but then again you made good use of it in Theorem 19, so that shows what I know.
I found the section on voting to need revision for clarity. Is the idea that each voter submits a function, the outcomes are normalized and summed, and the outcome with the highest value wins (like in range voting—except fixed-variance voting)? Either I missed the explanation or you need to explain this. Later in Theorem 14 you assumed that each agent voted with its utility function (proved later in Thm 19, good work by the way, but please don’t assume it without comment earlier), and we need to remember that all the way back in 4.0 you explained why to normalize v and u the same.
Overall I’d like to see you move away from the shaky notion of “a priori voting power” in the conclusion, by translating from the case of voting back into the original case of moral philosophy. I’m pretty sold that variance normalization is better than range normalization though.
Thanks for the feedback!
I think the key benefit of the parliamentary model is that the members will vote trade in order to maximize their expectation.
My suspicion is that this just corresponds to some particular rule for normalizing preferences over strategies. The “amount of power” given to each faction is capped, so that even if some faction has an extreme opinion about one issue it can only express itself by being more and more willing to trade other things to get it.
If goodness numbers are normalized, and some moral theory wants to express a large relative preference for one thing over another, it can’t just crank up the number on the thing it likes—it must flatten the contrast of things it cares less about in order to express a more extreme preference for one thing.
I propose to work through a simple example to check whether it aligns with the methods which normalise preferences and sum even in a simple case.
Setup:
Theory I, with credence p, and and Theory II with credence 1-p.
We will face a decision either between A and B (with probability 50%), or between C and D (with probability 50%).
Theory I prefers A to B and prefers C to D, but cares twice as much about the difference between A and B as that between C and D.
Theory II prefers B to A and prefers D to C, but cares twice as much about the difference between D and C as that between B and A.
Questions: What will the bargaining outcome be? What will normalisation procedures do?
Normalisation procedures: if they are ‘structural’ (not caring about details like the names of the theories or outcomes), then the two theories are symmetric, so they must be normalised in the same way. WLOG, as follows:
T1(A) = 2, T1(B) = 0, T1(C) = 1, T1(D) = 0 T2(A) = 0, T2(B) = 1, T2(C) = 0, T2(D) = 2
Then letting q = (1-p) the aggregate preferences T are given by:
T(A) = 2p, T(B) = q, T(C) = p, T(D) = q
So:
if p > 2⁄3, the aggregate chooses A and C
if 1⁄3 < p < 2⁄3, the aggregate chooses A and D
if p < 1⁄3, the aggregate chooses B and D
The advantage of this simple set-up is that I didn’t have to make any assumptions about the normalisation procedure beyond that it is structural. If the bargaining outcome agrees with this we may need to look at more complicated cases; if it disagrees we have discovered something already.
For the bargaining outcome, I’ll assume we’re looking for a Nash Bargaining Solution (as suggested in another comment thread).
The defection point has expected utility 3p/2 for Theory I and expected utility 3q/2 for Theory II (using the same notation as I did in this comment).
I don’t see immediately how to calculate the NBS from this.
Assume p = 2⁄3.
Then Theory I has expected utility 1, and Theory 2 has expected utility 1⁄2.
Assume (x,y) is the solution point, where x represents probability of voting for A (over B), and y represents probability of voting for C (over D). I claim without proof that the NBS has x=1 … seems hard for this not to be the case, but would be good to check it carefully.
Then the utility of Theory 1 for the point (1, y) = 1 + y/2, and utility of Theory 2 = 1 - y. To maximise the product of the benefits over the defection point we want to maximise y/2*(1/2 - y). This corresponds to maximising y/2 - y^2. Taking the derivative this happens when y = 1⁄4.
Note that the normalisation procedure leads to being on the fence between C and D at p = 2⁄3.
If I’m correct in my ad-hoc approach to calculating the NBS when p = 2⁄3, then this is firmly in the territory which prefers D to C. Therefore the parliamentary model gives different solutions to any normalisation procedure.
Yes, assuming that the delegates always take any available Pareto improvements, it should work out to that [edit: nevermind; I didn’t notice that owencb already showed that that is false]. That doesn’t necessarily make the parliamentary model useless, though. Finding nice ways to normalize preferences is not easy, and if we end up deriving some such normalization rule with desirable properties from the parliamentary model, I would consider that a success.
Harsanyi’s theorem will tell us that it will after the fact be equivalent to some normalisation—but the way you normalise preferences may vary with the set of preferences in the parliament (and the credences they have). And from a calculation elsewhere in this comment thread I think it will have to vary with these things.
I don’t know if such a thing is still best thought of as a ‘rule for normalising preferences’. It still seems interesting to me.
Yes, that sounds right. Harsanyi’s theorem was what I was thinking of when I made the claim, and then I got confused for a while when I saw your counterexample.
This actually sounds plausible to me, but I’m not sure how to work it out formally. It might make for a suprising and interesting result.
I think there’s already been a Stuart Armstrong post containing the essential ideas, but I can’t find it. So asking him might be a good start.
Any parliamentary model will involve voting.
When voting arrows impossibly theorm is going to impose constraints that can’t be avoided http://en.m.wikipedia.org/wiki/Arrow’s_impossibility_theorem
In particular it is impossible to have all of the below
If every voter prefers alternative X over alternative Y, then the group prefers X over Y. If every voter’s preference between X and Y remains unchanged, then the group’s preference between X and Y will also remain unchanged (even if voters’ preferences between other pairs like X and Z, Y and Z, or Z and W change). There is no “dictator”: no single voter possesses the power to always determine the group’s preference.
So it’s worthwhile to pick which bullet to bite first and design with that in mind as a limitation rather than just getting started and later on realize you’re boxed into a corner on this point.
[will reformat when not typing on phone]
The easiest bullet to bite is the “ordinal preferences” bullet. If you allow the group to be indifferent between options, then the impossibility disappears. (You may end up with a group that uses a sensible voting rule that is indifferent between all options, but that’s because the group is balanced in its opposition.)
This doesn’t work so well if you want to use it as a decision rule. You may end up with some ranking which leaves you indifferent between the top two options, but then you still need to pick one. I think you need to explain why whatever process you use to do that wasn’t considered part of the voting system.
It seems to me that decision rules that permit indifference are more useful than decision rules that do not permit indifference, because fungibility of actions is a useful property. That is, I would view the decision rule as expressing preferences over classes of actions, but not specifying which of the actions to take within the class because it doesn’t see a difference between them. Considering Buridan’s Ass, it would rather “go eat hay” than “not go eat hay,” but doesn’t have a high-level preference for the left or right bale of hay, just like it doesn’t have a preference whether it starts walking with its right hoof or its left hoof.
Something must have a preference—perhaps the Ass is right-hoofed, and so it leads with its right hoof and goes to the right bale of hay—but treating that decision as its own problem of smaller scope seems superior to me than specifying every possible detail in the high-level decision problem.
This is the condition I want to give up on. I’m not even convinced that it’s desirable.
Something like independence of irrelevant alternatives or, at least, independence of clones is necessary to avoid spoiler effect, otherwise one can get situations like this one.
Yes I think independence of clones is quite strongly desirable.
I was thinking last night of how vote trading would work in a completely rational parliamentary system. To simplify things a bit, lets assume that each issue is binary, each delegate holds a position on every issue, and that position can be normalized to a 0.0 − 1.0 ranking. (e.g. If I have a 60% belief that I will gain 10 utility from this issue being approved, it may have a normalized score of .6, if it is a 100% belief that I will gain 10 utility it may be a .7, while a 40% chance of −1000 utility may be a .1) The mapping function doesn’t really matter too much, as long as it can map to the 0-1 scale for simplification.
The first point that seems relatively obvious to me is that all rational agents will intentionally mis-state their utility functions as extremes for bargaining purposes. In a trade, you should be able to get a much better exchange by offering to update from 0 to 1 than you would for updating from 0.45 to 1, and as such, I would expect all utility function outputs to be reported to others as either 1 or 0, which simplifies things even further, though internally, each delegate would keep their true utlity function values. (As a sanity check, compare this to the current parliamentary models in the real world, where most politicians represent their ideals publicly as either strongly for or strongly against)
The second interesting point I noticed is that with the voting system as proposed, where every additional vote grants additional probability of the measure being enacted, every vote counts. This means it is always a good trade for me to exchange votes when my expected value of the issue you are changing position on is higher than my expected value of the position I am changing position on. This leads to a situation, where I am better off changing positions on every issue except the one that brings me the most utility in exchange for votes on the issue that brings me the most utility. Essentially, this means that the only issue that matters to an individual delegate is the issue that potentially brings them the most utility, and the rest of the issues are just fodder for trading.
Given the first point I mentioned, that all values should be externally represented as either 1 or 0, it seems that any vote trade will be a straight 1 for 1 trade. I haven’t exactly worked out the math here, but I’m pretty sure that for an arbitrarily large parliament with an arbitrarily large number of issues (to be used for trading), the result of any given vote will be determined by the proportion of delegates holding that issue as either their highest or lowest utility issue, with the rest of the delegates trading their votes on that issue for votes on another issue they find to be higher utility. (As a second sanity check, this also seems to conform closely to reality with the way lobbyist groups push single issues and politicians trade votes to push their pet issues through the vote.)
This is probably an oversimplified case, but I thought I’d throw it for discussion to see if it sparked any new ideas.
Because we’re working in an idealised hypothetical, we could decree that they can’t do this (they must all wear their true utility functions on their sleeves). I don’t see a disadvantage to demanding this.
If what you say is true about all trades being 1-for-1, that seems more like a bug than a feature; if an agent doesn’t have any votes valuable enough to sway others, it seems like I’d want them to be able (i.e. properly incentivized) to offer more votes, so that the system overall can reflect the aggregate’s values more sensitively. I don’t have a formal criterion that says why this would be better, but maybe that points towards one.
Can MPs have unknown utility functions? For example, I might have a relatively low confidence in all explicitly formulated moral theories, and want to give a number of MPs to System 1 - but I don’t know in advance how System 1 will vote. Is that problem outside the scope of the parliamentary model (i.e., I can’t nominate MPs who don’t “know” how they will vote)?
Can MPs have undecidable preference orderings (or sub-orderings)? E.g., such an MP might have some moral axioms that provide orderings for some bills but not others.