The first is that expected utility maximization isn’t the same thing as utilitarianism.
The problem is that if I adopt unbounded utility maximization, then I perceive it to converge with utilitarianism. Even completely selfish values seem to converge with utilitarian motives. Not only does every human, however selfish, care about other humans, but they are also instrumental to their own terminal values.
Solving friendly AI means to survive. As long as you don’t expect to be able to overpower all other agents, by creating your own FOOMing AI, the best move is to play the altruism card and argue in favor of making an AI friendly_human.
Another important aspect is that it might be rational to treat copies of you, or agents with similar utility-functions (or ultimate preferences), as yourself (or at least assign non-negligible weight to them). One argument in favor of this is that the goals of rational agents with the same preferences will ultimately converge and are therefore instrumental in realizing what you want.
But even if you only care little about anything but near-term goals revealed to you by naive introspection, taking into account infinite (or nearly infinite, e.g. 3^^^^3) scenarios can easily outweigh those goals.
All in all, if you adopt unbounded utility maximization and you are not completely alien, you might very well end up pursuing utilitarian motives.
A real world example is my vegetarianism. I assign some weight to sub-human suffering, enough to outweigh the joy of eating meat. Yet I am willing to consume medical comforts that are a result of animal experimentation. I would also eat meat if I would otherwise die. Yet, if the suffering was big enough I would die even for sub-human beings, e.g. 3^^^^3 pigs being eaten. As a result, if I take into account infinite scenarios, my terminal values converge with that of someone subscribed to utilitarianism.
The problem, my problem, is that if all beings would think like this and sacrifice their own life’s, no being would end up maximizing utility. This is contradictory. One might argue that it is incredible unlikely to be in the position to influence so many other beings, and therefore devote some resources to selfish near-term values. But charities like the SIAI claim that I am in the position to influence enough beings to outweigh any other goals. At the end of the day I am left with the decision to either abandon unbounded utility maximization or indulge myself into the craziness of infinite ethics.
At the end of the day I am left with the decision to either abandon unbounded utility maximization or indulge myself into the craziness of infinite ethics.
How about, for example, assigning .5 probability to a bounded utility function (U1), and .5 probability to an unbounded (or practically unbounded) utility function (U2)? You might object that taking the average of U1 and U2 still gives an unbounded utility function, but I think the right way to handle this kind of value uncertainty is by using a method like the one proposed by Bostrom and Ord, in which case you ought to end up spending roughly half of your time/resources on what U1 says you should do, and half on what U2 says you should do.
I haven’t studied all the discussions on the parliamentary model, but I’m finding it hard to understand what the implications are, and hard to judge how close to right it is. Maybe it would be enlightening if some of you who do understand the model took a shot at answering (or roughly approximating the answers to) some practice problems? I’m sure some of these are underspecified and anyone who wants to answer them should feel free to fill in details. Also, if it matters, feel free to answer as if I asked about mixed motivations rather than moral uncertainty:
I assign 50% probability to egoism and 50% to utilitarianism, and am going along splitting my resources about evenly between those two. Suddenly and completely unexpectedly, Omega shows up and cuts down my ability to affect my own happiness by a factor of one hundred trillion. Do I keep going along splitting my resources about evenly between egoism and utilitarianism?
I’m a Benthamite utilitarian but uncertain about the relative values of pleasure (measured in hedons, with a hedon calibrated as e.g. me eating a bowl of ice cream) and pain (measured in dolors, with a dolor calibrated as e.g. me slapping myself in the face). My probability distribution over the 10-log of the number of hedons that are equivalent to one dolor is normal with mean 2 and s.d. 2. Someone offers me the chance to undergo one dolor but get N hedons. For what N should I say yes?
I have a marshmallow in front of me. I’m 99% sure of a set of moral theories that all say I shouldn’t be eating it because of future negative consequences. However, I have this voice telling me that the only thing that matters in all the history of the universe is that I eat this exact marshmallow in the next exact minute and I assign 1% probability to it being right. What do I do?
I’m 80% sure that I should be utilitarian, 15% sure that I should be egoist, and 5% sure that all that matters is that egoism plays no part in my decision. I’m given a chance to save 100 lives at the price of my own. What do I do?
I’m 100% sure that the only thing that intrinsically matters is whether a light bulb is on or off, but I’m 60% sure that it should be on and 40% sure that it should be off. I’m given an infinite sequence of opportunities to flip the switch (and no opportunity to improve my estimates). What do I do?
There are 1000 people in the universe. I think my life is worth M of theirs, with the 10-log of M uniformly distributed from −3 to 3. I will be given the opportunity to either save my own life or 30 other people’s lives, but first I will be given the opportunity to either save 3 people’s lives or learn the exact value of M with certainty. What do I do?
Why spend only half on U1? Spend (1 - epsilon). And write a lottery ticket giving the U2-oriented decision maker the power with probability epsilon. Since epsilon infinity = infinity, you still get infinite expected* utility (according to U2). And you also get pretty close to the max possible according to U1.
Infinity has uses even beyond allocating hotel rooms. (HT to A. Hajek)
Of course, Hajek’s reasoning also makes it difficult to locate exactly what it is that U2 “says you should do”.
In general, it should be impossible to allocate 0 to U2 in this sense. What’s the probability that an angel comes down and magically forces you to do the U2 decision? Around epsilon, i’d say.
U2 then becomes totally meaningless, and we are back with a bounded utility function.
you ought to end up spending roughly half of your time/resources on what U1 says you should do, and half on what U2 says you should do
That can’t be right. What if U1 says you ought to buy an Xbox, then U2 says you ought to throw it away? Looks like a waste of resources. To avoid such wastes, your behavior must be Bayesian-rational. That means it must be governed by a utility function U3. What U3 is defined by the parliamentary model? You say it’s not averaging, but it has to be some function defined in terms of U1 and U2.
We’ve discussed a similar problem proposed by Stuart on the mailing list and I believe I gave a good argument (on Jan 21, 2011) that U3 must be some linear combination of U1 and U2 if you want to have nice things like Pareto-optimality. All bargaining should be collapsed into the initial moment, and output the coefficients of the linear combination which never change from that point on.
Right, clearly what I said can’t be true for arbitrary U1 and U2, since there are obvious counterexamples. And I think you’re right that theoretically, bargaining just determines the coefficients of the linear combination of the two utility functions. But it seems hard to apply that theory in practice, whereas if U1 and U2 are largely independent and sublinear in resources, splitting resources between them equally (perhaps with some additional Pareto improvements to take care of any noticeable waste from pursuing two completely separate plans) seems like a fair solution that can be applied in practice.
(ETA side question: does your argument still work absent logical omniscience, for example if one learns additional logical facts after the initial bargaining? It seems like one might not necessarily want to stick with the original coefficients if they were negotiated based on an incomplete understanding of what outcomes are feasible, for example.)
I can’t tell what that combination is, which is odd. The non-smoothness is problematic. You run right up against the constraints—I don’t remember how to deal with this. Can you?
If you have N units of resources which can be devoted to either task A or task B, the ratios of resource used will be the ratio of votes.
I think it depends on what kind of contract you sign. So if I sign a contract that says “we decide according to this utility function” you get something different then a contract that says “We vote yes in these circumstances and no in those circumstances”. The second contract, you can renegotiate, and that can change the utility function.
ETA:
In the case where utility is linear in the set of decisions that go to each side, for any Pareto-optimal allocation that both parties prefer to the starting (random) alllocation, you can construct a set of prices that is consistent with that allocation. So you’re reduced to bargaining, which I guess means Nash arbitration.
I don’t know how to make decisions under logical uncertainty in general. But in our example I suppose you could try to phrase your uncertainty about logical facts you might learn in the future in Bayesian terms, and then factor it into the initial calculation.
The first is that expected utility maximization isn’t the same thing as utilitarianism.
The problem is that if I adopt unbounded utility maximization, then I perceive it to converge with utilitarianism. Even completely selfish values seem to converge with utilitarian motives. Not only does every human, however selfish, care about other humans, but they are also instrumental to their own terminal values.
These are surely really, really different things. Utilitarianism says to count people more-or-less equally. However, the sort of utility maximization that actually goes on in people’s heads typically results in people valuing their own existence vastly above that of everyone else. That is because they were built that way by evolution—which naturally favours egoism. So, their utility function says: “Me, me, me! I, me, mine!” This is not remotely like utilitarianism—which explains why utilitarians have such a hard time acting on their beliefs—they are wired up by nature to do something totally different.
Also, you probably should not say “instrumental to their own terminal values”. “Instrumental” in this context usually refers to “instrumental values”. Using it to mean something else is likely to mangle the reader’s mind.
At the end of the day I am left with the decision to either abandon unbounded utility maximization or indulge myself into the craziness of infinite ethics.
So, I think about things like infinite ethics all the time, and it doesn’t seem to disturb me to the extent it does you. You might say, “My brain is set up such that I automatically feel a lot of tension/drama when I feel like I might be ignoring incredibly morally important things.” But it is unclear that this need be the case. I can’t imagine that the resulting strain is useful in the long run. Have you tried jumping up a meta-level, tried to understand and resolve whatever’s causing the strain? I try to think of it as moving in harmony with the Dao.
The problem is that if I adopt unbounded utility maximization, then I perceive it to converge with utilitarianism. Even completely selfish values seem to converge with utilitarian motives. Not only does every human, however selfish, care about other humans, but they are also instrumental to their own terminal values.
Solving friendly AI means to survive. As long as you don’t expect to be able to overpower all other agents, by creating your own FOOMing AI, the best move is to play the altruism card and argue in favor of making an AI friendly_human.
Another important aspect is that it might be rational to treat copies of you, or agents with similar utility-functions (or ultimate preferences), as yourself (or at least assign non-negligible weight to them). One argument in favor of this is that the goals of rational agents with the same preferences will ultimately converge and are therefore instrumental in realizing what you want.
But even if you only care little about anything but near-term goals revealed to you by naive introspection, taking into account infinite (or nearly infinite, e.g. 3^^^^3) scenarios can easily outweigh those goals.
All in all, if you adopt unbounded utility maximization and you are not completely alien, you might very well end up pursuing utilitarian motives.
A real world example is my vegetarianism. I assign some weight to sub-human suffering, enough to outweigh the joy of eating meat. Yet I am willing to consume medical comforts that are a result of animal experimentation. I would also eat meat if I would otherwise die. Yet, if the suffering was big enough I would die even for sub-human beings, e.g. 3^^^^3 pigs being eaten. As a result, if I take into account infinite scenarios, my terminal values converge with that of someone subscribed to utilitarianism.
The problem, my problem, is that if all beings would think like this and sacrifice their own life’s, no being would end up maximizing utility. This is contradictory. One might argue that it is incredible unlikely to be in the position to influence so many other beings, and therefore devote some resources to selfish near-term values. But charities like the SIAI claim that I am in the position to influence enough beings to outweigh any other goals. At the end of the day I am left with the decision to either abandon unbounded utility maximization or indulge myself into the craziness of infinite ethics.
How about, for example, assigning .5 probability to a bounded utility function (U1), and .5 probability to an unbounded (or practically unbounded) utility function (U2)? You might object that taking the average of U1 and U2 still gives an unbounded utility function, but I think the right way to handle this kind of value uncertainty is by using a method like the one proposed by Bostrom and Ord, in which case you ought to end up spending roughly half of your time/resources on what U1 says you should do, and half on what U2 says you should do.
I haven’t studied all the discussions on the parliamentary model, but I’m finding it hard to understand what the implications are, and hard to judge how close to right it is. Maybe it would be enlightening if some of you who do understand the model took a shot at answering (or roughly approximating the answers to) some practice problems? I’m sure some of these are underspecified and anyone who wants to answer them should feel free to fill in details. Also, if it matters, feel free to answer as if I asked about mixed motivations rather than moral uncertainty:
I assign 50% probability to egoism and 50% to utilitarianism, and am going along splitting my resources about evenly between those two. Suddenly and completely unexpectedly, Omega shows up and cuts down my ability to affect my own happiness by a factor of one hundred trillion. Do I keep going along splitting my resources about evenly between egoism and utilitarianism?
I’m a Benthamite utilitarian but uncertain about the relative values of pleasure (measured in hedons, with a hedon calibrated as e.g. me eating a bowl of ice cream) and pain (measured in dolors, with a dolor calibrated as e.g. me slapping myself in the face). My probability distribution over the 10-log of the number of hedons that are equivalent to one dolor is normal with mean 2 and s.d. 2. Someone offers me the chance to undergo one dolor but get N hedons. For what N should I say yes?
I have a marshmallow in front of me. I’m 99% sure of a set of moral theories that all say I shouldn’t be eating it because of future negative consequences. However, I have this voice telling me that the only thing that matters in all the history of the universe is that I eat this exact marshmallow in the next exact minute and I assign 1% probability to it being right. What do I do?
I’m 80% sure that I should be utilitarian, 15% sure that I should be egoist, and 5% sure that all that matters is that egoism plays no part in my decision. I’m given a chance to save 100 lives at the price of my own. What do I do?
I’m 100% sure that the only thing that intrinsically matters is whether a light bulb is on or off, but I’m 60% sure that it should be on and 40% sure that it should be off. I’m given an infinite sequence of opportunities to flip the switch (and no opportunity to improve my estimates). What do I do?
There are 1000 people in the universe. I think my life is worth M of theirs, with the 10-log of M uniformly distributed from −3 to 3. I will be given the opportunity to either save my own life or 30 other people’s lives, but first I will be given the opportunity to either save 3 people’s lives or learn the exact value of M with certainty. What do I do?
Why spend only half on U1? Spend (1 - epsilon). And write a lottery ticket giving the U2-oriented decision maker the power with probability epsilon. Since epsilon infinity = infinity, you still get infinite expected* utility (according to U2). And you also get pretty close to the max possible according to U1.
Infinity has uses even beyond allocating hotel rooms. (HT to A. Hajek)
Of course, Hajek’s reasoning also makes it difficult to locate exactly what it is that U2 “says you should do”.
In general, it should be impossible to allocate 0 to U2 in this sense. What’s the probability that an angel comes down and magically forces you to do the U2 decision? Around epsilon, i’d say.
U2 then becomes totally meaningless, and we are back with a bounded utility function.
That can’t be right. What if U1 says you ought to buy an Xbox, then U2 says you ought to throw it away? Looks like a waste of resources. To avoid such wastes, your behavior must be Bayesian-rational. That means it must be governed by a utility function U3. What U3 is defined by the parliamentary model? You say it’s not averaging, but it has to be some function defined in terms of U1 and U2.
We’ve discussed a similar problem proposed by Stuart on the mailing list and I believe I gave a good argument (on Jan 21, 2011) that U3 must be some linear combination of U1 and U2 if you want to have nice things like Pareto-optimality. All bargaining should be collapsed into the initial moment, and output the coefficients of the linear combination which never change from that point on.
Right, clearly what I said can’t be true for arbitrary U1 and U2, since there are obvious counterexamples. And I think you’re right that theoretically, bargaining just determines the coefficients of the linear combination of the two utility functions. But it seems hard to apply that theory in practice, whereas if U1 and U2 are largely independent and sublinear in resources, splitting resources between them equally (perhaps with some additional Pareto improvements to take care of any noticeable waste from pursuing two completely separate plans) seems like a fair solution that can be applied in practice.
(ETA side question: does your argument still work absent logical omniscience, for example if one learns additional logical facts after the initial bargaining? It seems like one might not necessarily want to stick with the original coefficients if they were negotiated based on an incomplete understanding of what outcomes are feasible, for example.)
My thoughts:
You do always get a linear combination.
I can’t tell what that combination is, which is odd. The non-smoothness is problematic. You run right up against the constraints—I don’t remember how to deal with this. Can you?
If you have N units of resources which can be devoted to either task A or task B, the ratios of resource used will be the ratio of votes.
I think it depends on what kind of contract you sign. So if I sign a contract that says “we decide according to this utility function” you get something different then a contract that says “We vote yes in these circumstances and no in those circumstances”. The second contract, you can renegotiate, and that can change the utility function.
ETA:
In the case where utility is linear in the set of decisions that go to each side, for any Pareto-optimal allocation that both parties prefer to the starting (random) alllocation, you can construct a set of prices that is consistent with that allocation. So you’re reduced to bargaining, which I guess means Nash arbitration.
I don’t know how to make decisions under logical uncertainty in general. But in our example I suppose you could try to phrase your uncertainty about logical facts you might learn in the future in Bayesian terms, and then factor it into the initial calculation.
These are surely really, really different things. Utilitarianism says to count people more-or-less equally. However, the sort of utility maximization that actually goes on in people’s heads typically results in people valuing their own existence vastly above that of everyone else. That is because they were built that way by evolution—which naturally favours egoism. So, their utility function says: “Me, me, me! I, me, mine!” This is not remotely like utilitarianism—which explains why utilitarians have such a hard time acting on their beliefs—they are wired up by nature to do something totally different.
Also, you probably should not say “instrumental to their own terminal values”. “Instrumental” in this context usually refers to “instrumental values”. Using it to mean something else is likely to mangle the reader’s mind.
So, I think about things like infinite ethics all the time, and it doesn’t seem to disturb me to the extent it does you. You might say, “My brain is set up such that I automatically feel a lot of tension/drama when I feel like I might be ignoring incredibly morally important things.” But it is unclear that this need be the case. I can’t imagine that the resulting strain is useful in the long run. Have you tried jumping up a meta-level, tried to understand and resolve whatever’s causing the strain? I try to think of it as moving in harmony with the Dao.