Thanks for posting this! This is a fairly satisfying answer to my question from before.
Can you clarify which people you want to apply this theorem to? I don’t think the relevant people should be the set of all humans alive at the time that the FAI decides what to do because this population is not fixed over time and doesn’t have fixed utility functions over time. I can think of situations where I would want the FAI to make a decision that all humans alive at a fixed time would disagree with (for example, suppose most humans die and the only ones left happen to be amoral savages), and I also have no idea how to deal with changing populations with changing utility functions in general.
So it seems the FAI should be aggregating the preferences of a fixed set of people for all time. But this also seems problematic.
Can you clarify which people you want to apply this theorem to?
I’m not entirely sure. My default answer to that is “all people alive at the time that the singularity occurs”, although you pointed out a possible drawback to that (it incentivizes people to create more people with values similar to their own) in our previous discussion. This is really an instrumental question: What set of people should I suggest get to have their utility functions aggregated into the CEV so as to best maximize my utility? One possible answer is to aggregate the utilities of everyone who worked on or supported the FAI project, but I suspect that due to the influence of far thinking, that would actually be a terrible way to motivate people to work on FAI, and it should actually be much broader than that.
So it seems the FAI should be aggregating the preferences of a fixed set of people for all time. But this also seems problematic.
I don’t think it would be terribly problematic. “People in the future should get exactly what we currently would want them to get if we were perfectly wise and knew their values and circumstances” seems like a pretty good rule. It is, after all, what we want.
My default answer to that is “all people alive at the time that the singularity occurs”, although you pointed out a possible drawback to that (it incentivizes people to create more people with values similar to their own) in our previous discussion.
And also incentivizes people to kill people with values dissimilar to their own!
I don’t think it would be terribly problematic. “People in the future should get exactly what we currently would want them to get if we were perfectly wise and knew their values and circumstances” seems like a pretty good rule. It is, after all, what we want.
And also incentivizes people to kill people with values dissimilar to their own!
That’s a pretty good nail in the coffin. Maybe all people alive at the time of your comment. Or at any point in some interval containing that time, possibly including up to the time the singularity occurs. Although again, these are crude guesses, not final suggestions. This might be a good question to think more about.
It’s not as bad as it sounds. Both arguments are also arguments against democracy, but I don’t think they’re knockdown arguments against democracy (although the general point that democracy can be gamed by brainwashing enough people is good to keep in mind, and I think is a point that Moldbug, for example, is quite preoccupied with). For example, killing people doesn’t appear to be a viable strategy for gaining control of the United States at the moment. Although the killing-people strategy in the FAI case might look more like “the US decides to nuke Russia immediately before the singularity occurs.”
Dumb solution: an FAI could have a sense of justice which downweights the utility function of people who are killing and/or procreating to game their representation in AI’s utility function, or something like that do disincentivize it. (It’s dumb because I don’t know how to operationalize justice; maybe enough people would not cheat and want to punish the cheaters that the FAI would figure that out.)
Also, given what we mostly believe about moral progress, I think defining morality in terms of the CEV of all people who ever lived is probably okay… they’d probably learn to dislike slavery in the AI’s simulation of them.
Thanks for posting this! This is a fairly satisfying answer to my question from before.
Can you clarify which people you want to apply this theorem to? I don’t think the relevant people should be the set of all humans alive at the time that the FAI decides what to do because this population is not fixed over time and doesn’t have fixed utility functions over time. I can think of situations where I would want the FAI to make a decision that all humans alive at a fixed time would disagree with (for example, suppose most humans die and the only ones left happen to be amoral savages), and I also have no idea how to deal with changing populations with changing utility functions in general.
So it seems the FAI should be aggregating the preferences of a fixed set of people for all time. But this also seems problematic.
I’m not entirely sure. My default answer to that is “all people alive at the time that the singularity occurs”, although you pointed out a possible drawback to that (it incentivizes people to create more people with values similar to their own) in our previous discussion. This is really an instrumental question: What set of people should I suggest get to have their utility functions aggregated into the CEV so as to best maximize my utility? One possible answer is to aggregate the utilities of everyone who worked on or supported the FAI project, but I suspect that due to the influence of far thinking, that would actually be a terrible way to motivate people to work on FAI, and it should actually be much broader than that.
I don’t think it would be terribly problematic. “People in the future should get exactly what we currently would want them to get if we were perfectly wise and knew their values and circumstances” seems like a pretty good rule. It is, after all, what we want.
And also incentivizes people to kill people with values dissimilar to their own!
Fair enough. Hmm.
That’s a pretty good nail in the coffin. Maybe all people alive at the time of your comment. Or at any point in some interval containing that time, possibly including up to the time the singularity occurs. Although again, these are crude guesses, not final suggestions. This might be a good question to think more about.
It’s not as bad as it sounds. Both arguments are also arguments against democracy, but I don’t think they’re knockdown arguments against democracy (although the general point that democracy can be gamed by brainwashing enough people is good to keep in mind, and I think is a point that Moldbug, for example, is quite preoccupied with). For example, killing people doesn’t appear to be a viable strategy for gaining control of the United States at the moment. Although the killing-people strategy in the FAI case might look more like “the US decides to nuke Russia immediately before the singularity occurs.”
Perhaps not, but it might help maintain control of the USG insofar as popularity increases the chances of reelection and killing (certain) people increases popularity.
Dumb solution: an FAI could have a sense of justice which downweights the utility function of people who are killing and/or procreating to game their representation in AI’s utility function, or something like that do disincentivize it. (It’s dumb because I don’t know how to operationalize justice; maybe enough people would not cheat and want to punish the cheaters that the FAI would figure that out.)
Also, given what we mostly believe about moral progress, I think defining morality in terms of the CEV of all people who ever lived is probably okay… they’d probably learn to dislike slavery in the AI’s simulation of them.