What if we also add a requirement that the FAI doesn’t make anyone worse off in expected utility compared to no FAI? That seems reasonable, but conflicts the other axioms. For example, suppose there are two agents: A gets 1 util if 90% of the universe is converted into paperclips, 0 utils otherwise, and B gets 1 util if 90% of the universe is converted into staples, 0 utils otherwise. Without an FAI, they’ll probably end up fighting each other for control of the universe, and let’s say each has 30% chance of success. An FAI that doesn’t make one of them worse off has to prefer a 50⁄50 lottery of the universe turning into either paperclips or staples to a certain outcome of either, but that violates VNM rationality.
What if we also add a requirement that the FAI doesn’t make anyone worse off in expected utility compared to no FAI?
Sounds obviously unreasonable to me. E.g. a situation where a person derives a large part of their utility from having kidnapped and enslaved somebody else: the kidnapper would be made worse off if their slave was freed, but the slave wouldn’t become worse off if their slavery merely continued, so...
The way I said that may have been too much of a distraction from the real problem, which I’ll restate as: considerations of fairness, which may arise from bargaining or just due to fairness being a terminal value for some people, can imply that the most preferred outcome lies on a flat part of the Pareto frontier of feasible expected utilities, in which case such preferences are not VNM rational and the result described in the OP can’t be directly applied.
What if we also add a requirement that the FAI doesn’t make anyone worse off in expected utility compared to no FAI?
I don’t think that seems reasonable at all, especially when some agents want to engage in massively negative-sum games with others (like those you describe), or have massively discrete utility functions that prevent them from compromising with others (like those you describe). I’m okay with some agents being worse off with the FAI, if that’s the kind of agents they are.
Luckily, I think people, given time to reflect and grown and learn, are not like that, which is probably what made the idea seem reasonable to you.
I’m okay with some agents being worse off with the FAI, if that’s the kind of agents they are.
Do you see CEV as about altruism, instead of cooperation/bargaining/politics? It seems to me the latter is more relevant, since if it’s just about altruism, you could use CEV instead of CEV. So, if you don’t want anyone to have an incentive to shut down an FAI project, you need to make sure they are not made worse off by an FAI. Of course you could limit this to people who actually have the power to shut you down, but my point is that it’s not entirely up to you which agents the FAI can make worse off.
Luckily, I think people, given time to reflect and grown and learn, are not like that
Right, this could be another way to solve the problem: show that of the people you do have to make sure are not made worse off, their actual values (given the right definition of “actual values”) are such that a VNM-rational FAI would be sufficient to not make them worse off. But even if you can do that, it might still be interesting and productive to look into why VNM-rationality doesn’t seem to be “closed under bargaining”.
Also, suppose I personally (according to my sense of altruism) do not want to make anyone among worse off by my actions. Depending on their actual utility functions, it seems that my preferences may not be VNM-rational. So maybe it’s not safe to assume that the inputs to this process are VNM-rational either?
Even if it’s about bargaining rather than about altruism, it’s still okay to have someone worse off under the FAI just so long as they would not be able to predict ahead of time that they wold get the short end of the stick. It’s possible to have everyone benefit in expectation by creating an AI that is willing to make some people (who humans cannot predict the identity of ahead of time) worse off if it brings sufficient gain to the others.
I agree with this, which is why I said “worse off in expected utility” at the beginning of the thread. But I think you need “would not be able to predict ahead of time” in a fairly strong sense, namely that they would not be able to predict it even if they knew all the details of how the FAI worked. Otherwise they’d want to adopt the conditional strategy “learn more about the FAI design, and try to shut it down if I learn that I will get the short end of the stick”. It seems like the easiest way to accomplish this is to design the FAI to explicitly not make certain people worse off, rather than depend on that happening as a likely side effect of other design choices.
I expect that with actual people, in practice, the FAI would leave no one worse off. But I wouldn’t want to hardwire that into the FAI because then its behavior would be too status quo-dependent.
It seems like too much of a hack, but maybe it’s not? Can you think of a general procedure for aggregating preferences that would lead to such an outcome (and also leads to sensible outcomes in other circumstances)?
It seems like too much of a hack, but maybe it’s not? Can you think of a general procedure for aggregating preferences that would lead to such an outcome (and also leads to sensible outcomes in other circumstances)?
Looking over my old emails, it seems that my email on Jan 21, 2011 proposed a solution to this problem. Namely, if the agents can agree on a point on the Pareto frontier given their current state of knowledge (e.g. the point where agent A and agent B each have 50% probability of winning), then they can agree on a procedure (possibly involving coinflips) whose result is guaranteed to be a Bayesian-rational merged agent, and the procedure yields the specified expected utilities to all agents given their current state of knowledge. Though you didn’t reply to that email, so I guess you found it unsatisfactory in some way...
I must not have been paying attention to the decision theory mailing list at that time. Thinking it over now, I think technically it works, but doesn’t seem very satisfying, because the individual agents jointly have non-VNM preferences, and are having to do all the work to pick out a specific mixed strategy/outcome. They’re then using a coin-flip + VNM AI just to reach that specific outcome, without the VNM AI actually embodying their joint preferences.
To put it another way, if your preferences can only be implemented by picking a VNM AI based on a coin flip, then your preferences are not VNM rational. The fact that any point on the Pareto frontier can be reached by a coin-flip + VNM AI seems more like a distraction to trying to figure how to get an AI to correctly embody such preferences.
What if we also add a requirement that the FAI doesn’t make anyone worse off in expected utility compared to no FAI? That seems reasonable, but conflicts the other axioms. For example, suppose there are two agents: A gets 1 util if 90% of the universe is converted into paperclips, 0 utils otherwise, and B gets 1 util if 90% of the universe is converted into staples, 0 utils otherwise. Without an FAI, they’ll probably end up fighting each other for control of the universe, and let’s say each has 30% chance of success. An FAI that doesn’t make one of them worse off has to prefer a 50⁄50 lottery of the universe turning into either paperclips or staples to a certain outcome of either, but that violates VNM rationality.
And things get really confusing when we also consider issues of logical uncertainty and dynamical consistency.
Sounds obviously unreasonable to me. E.g. a situation where a person derives a large part of their utility from having kidnapped and enslaved somebody else: the kidnapper would be made worse off if their slave was freed, but the slave wouldn’t become worse off if their slavery merely continued, so...
The way I said that may have been too much of a distraction from the real problem, which I’ll restate as: considerations of fairness, which may arise from bargaining or just due to fairness being a terminal value for some people, can imply that the most preferred outcome lies on a flat part of the Pareto frontier of feasible expected utilities, in which case such preferences are not VNM rational and the result described in the OP can’t be directly applied.
I don’t think that seems reasonable at all, especially when some agents want to engage in massively negative-sum games with others (like those you describe), or have massively discrete utility functions that prevent them from compromising with others (like those you describe). I’m okay with some agents being worse off with the FAI, if that’s the kind of agents they are.
Luckily, I think people, given time to reflect and grown and learn, are not like that, which is probably what made the idea seem reasonable to you.
Do you see CEV as about altruism, instead of cooperation/bargaining/politics? It seems to me the latter is more relevant, since if it’s just about altruism, you could use CEV instead of CEV. So, if you don’t want anyone to have an incentive to shut down an FAI project, you need to make sure they are not made worse off by an FAI. Of course you could limit this to people who actually have the power to shut you down, but my point is that it’s not entirely up to you which agents the FAI can make worse off.
Right, this could be another way to solve the problem: show that of the people you do have to make sure are not made worse off, their actual values (given the right definition of “actual values”) are such that a VNM-rational FAI would be sufficient to not make them worse off. But even if you can do that, it might still be interesting and productive to look into why VNM-rationality doesn’t seem to be “closed under bargaining”.
Also, suppose I personally (according to my sense of altruism) do not want to make anyone among worse off by my actions. Depending on their actual utility functions, it seems that my preferences may not be VNM-rational. So maybe it’s not safe to assume that the inputs to this process are VNM-rational either?
Even if it’s about bargaining rather than about altruism, it’s still okay to have someone worse off under the FAI just so long as they would not be able to predict ahead of time that they wold get the short end of the stick. It’s possible to have everyone benefit in expectation by creating an AI that is willing to make some people (who humans cannot predict the identity of ahead of time) worse off if it brings sufficient gain to the others.
I agree with this, which is why I said “worse off in expected utility” at the beginning of the thread. But I think you need “would not be able to predict ahead of time” in a fairly strong sense, namely that they would not be able to predict it even if they knew all the details of how the FAI worked. Otherwise they’d want to adopt the conditional strategy “learn more about the FAI design, and try to shut it down if I learn that I will get the short end of the stick”. It seems like the easiest way to accomplish this is to design the FAI to explicitly not make certain people worse off, rather than depend on that happening as a likely side effect of other design choices.
I expect that with actual people, in practice, the FAI would leave no one worse off. But I wouldn’t want to hardwire that into the FAI because then its behavior would be too status quo-dependent.
What do you think about Eliezer’s proposed solution of making the FAI’s utility function depend on a coinflip outcome?
It seems like too much of a hack, but maybe it’s not? Can you think of a general procedure for aggregating preferences that would lead to such an outcome (and also leads to sensible outcomes in other circumstances)?
It seems like too much of a hack, but maybe it’s not? Can you think of a general procedure for aggregating preferences that would lead to such an outcome (and also leads to sensible outcomes in other circumstances)?
Looking over my old emails, it seems that my email on Jan 21, 2011 proposed a solution to this problem. Namely, if the agents can agree on a point on the Pareto frontier given their current state of knowledge (e.g. the point where agent A and agent B each have 50% probability of winning), then they can agree on a procedure (possibly involving coinflips) whose result is guaranteed to be a Bayesian-rational merged agent, and the procedure yields the specified expected utilities to all agents given their current state of knowledge. Though you didn’t reply to that email, so I guess you found it unsatisfactory in some way...
I must not have been paying attention to the decision theory mailing list at that time. Thinking it over now, I think technically it works, but doesn’t seem very satisfying, because the individual agents jointly have non-VNM preferences, and are having to do all the work to pick out a specific mixed strategy/outcome. They’re then using a coin-flip + VNM AI just to reach that specific outcome, without the VNM AI actually embodying their joint preferences.
To put it another way, if your preferences can only be implemented by picking a VNM AI based on a coin flip, then your preferences are not VNM rational. The fact that any point on the Pareto frontier can be reached by a coin-flip + VNM AI seems more like a distraction to trying to figure how to get an AI to correctly embody such preferences.
What do you mean when you say the agents “jointly have non-VNM preferences”? Is there a definition of joint preferences?