Given some assumptions about the domains of the utility functions, it is possible to do better than what I described in the previous comment. Let Xi be the space of possible experience histories[1] of user i and Y the space of everything else the utility functions depend on (things that nobody can observe directly). Suppose that the domain of the utility functions is Z:=∏iXi×Y. Then, we can define the “denosing[2] operator” Di:C(Z)→C(Z) for user i by
(Diu)(xi,x−i,y):=maxx′∈∏j≠iXju(xi,x′,y)
Here, xi is the argument of u that ranges in Xi, x−i are the arguments that range in Xj for j≠i and y is the argument that ranges in Y.
That is, Di modifies a utility function by having it “imagine” that the experiences of all users other than i have been optimized, for the experiences of user i and the unobservables held constant.
Let ui:Z→R be the utility function of user i, and d0∈Rn the initial disagreement point (everyone dying), where n is the number of users. We then perform cooperative bargaining on the denosed utility functions Diui with disagreement point d0, producing some outcome μ0∈Δ(Z). Define d1∈Rn by d1i:=Eμ[ui]. Now we do another cooperative bargaining with d1 as the disagreement point and the original utility functions ui. This gives us the final outcome μ1.
Among other benefits, there is now much less need to remove outliers. Perhaps, instead of removing them we still want to mitigate them by applying “amplified denosing” to them which also removes the dependence on Y.
For this procedure, there is a much better case that the lower bound will be met.
This is very interesting (and “denosing operator” is delightful).
Some thoughts:
If I understand correctly, I think there can still be a problem where user i wants an experience history such that part of the history is isomorphic to a simulation of user j suffering (i wants to fully experience j suffering in every detail).
Here a fixed xi may entail some fixed xj for (some copy of) some j.
It seems the above approach can’t then avoid leaving one of i or j badly off: If i is permitted to freely determine the experience of the embedded j copy, the disagreement point in the second bargaining will bake this in: j may be horrified to see that i wants to experience its copy suffer, but will be powerless to stop it (if i won’t budge in the bargaining).
Conversely, if the embedded j is treated as a user which i will imagine is exactly to i’s liking, but who actually gets what j wants, then the selected μ0 will be horrible for i (e.g. perhaps i wants to fully experience Hitler suffering, and instead gets to fully experience Hitler’s wildest fantasies being realized).
I don’t think it’s possible to do anything like denosing to avoid this.
It may seem like this isn’t a practical problem, since we could reasonably disallow such embedding. However, I think that’s still tricky since there’s a less exotic version of the issue: my experiences likely already are a collection of subagents’ experiences. Presumably my maximisation over xjoe is permitted to determine all the xsubjoe.
It’s hard to see how you draw a principled line here: the ideal future for most people may easily be transhumanist to the point where today’s users are tomorrow’s subpersonalities (and beyond).
A case that may have to be ruled out separately is where i wants to become a suffering j. Depending on what I consider ‘me’, I might be entirely fine with it if ‘I’ wake up tomorrow as suffering j (if I’m done living and think j deserves to suffer). Or perhaps I want to clone myself 1010 times, and then have all copies convert themselves to suffering js after a while. [in general, it seems there has to be some mechanism to distribute resources reasonably—but it’s not entirely clear what that should be]
I think that a rigorous treatment of such issues will require some variant of IB physicalism (in which the monotonicity problem has been solved, somehow). I am cautiously optimistic that a denosing operator exists there which dodges these problems. This operator will declare both the manifesting and evaluation of the source codes of other users to be “out of scope” for a given user. Hence, a preference of i to observe the suffering of j would be “satisfied” by observing nearly anything, since the maximization can interpret anything as a simulation of j.
The “subjoe” problem is different: it is irrelevant because “subjoe” is not a user, only Joe is a user. All the transhumanist magic that happens later doesn’t change this. Users are people living during the AI launch, and only them. The status of any future (trans/post)humans is determined entirely according to the utility functions of users. Why? For two reasons: (i) the AI can only have access and stable pointers to existing people (ii) we only need the buy-in of existing people to launch the AI. If existing people want future people to be treated well, then they have nothing to worry about since this preference is part of the existing people’s utility functions.
Ah—that’s cool if IB physicalism might address this kind of thing (still on my to-read list).
Agreed that the subjoe thing isn’t directly a problem. My worry is mainly whether it’s harder to rule out i experiencing a simulation of xsubj−suffering, since subj isn’t a user. However, if you can avoid the suffering js by limiting access to information, the same should presumably work for relevant sub-js.
If existing people want future people to be treated well, then they have nothing to worry about since this preference is part of the existing people’s utility functions.
This isn’t so clear (to me at least) if:
Most, but not all current users want future people to be treated well.
Part of being “treated well” includes being involved in an ongoing bargaining process which decides the AI’s/future’s trajectory.
For instance, suppose initially 90% of people would like to have an iterated bargaining process that includes future (trans/post)humans as users, once they exist. The other 10% are only willing to accept such a situation if they maintain their bargaining power in future iterations (by whatever mechanism).
If you iterate this process, the bargaining process ends up dominated by users who won’t relinquish any power to future users. 90% of initial users might prefer drift over lock-in, but we get lock-in regardless (the disagreement point also amounting to lock-in).
Unless I’m confusing myself, this kind of thing seems like a problem. (not in terms of reaching some non-terrible lower bound, but in terms of realising potential) Wherever there’s this kind of asymmetry/degradation over bargaining iterations, I think there’s an argument for building in a way to avoid it from the start—since anything short of 100% just limits to 0 over time. [it’s by no means clear that we do want to make future people users on an equal footing to today’s people; it just seems to me that we have to do it at step zero or not at all]
Ah—that’s cool if IB physicalism might address this kind of thing
I admit that at this stage it’s unclear because physicalism brings in the monotonicity principle that creates bigger problems than what we discuss here. But maybe some variant can work.
For instance, suppose initially 90% of people would like to have an iterated bargaining process that includes future (trans/post)humans as users, once they exist. The other 10% are only willing to accept such a situation if they maintain their bargaining power in future iterations (by whatever mechanism).
Roughly speaking, in this case the 10% preserve their 10% of the power forever. I think it’s fine because I want the buy-in of this 10% and the cost seems acceptable to me. I’m also not sure there is any viable alternative which doesn’t have even bigger problems.
Sure, I’m not sure there’s a viable alternative either. This kind of approach seems promising—but I want to better understand any downsides.
My worry wasn’t about the initial 10%, but about the possibility of the process being iterated such that you end up with almost all bargaining power in the hands of power-keepers.
In retrospect, this is probably silly: if there’s a designable-by-us mechanism that better achieves what we want, the first bargaining iteration should find it. If not, then what I’m gesturing at must either be incoherent, or not endorsed by the 10% - so hard-coding it into the initial mechanism wouldn’t get the buy-in of the 10% to the extent that they understood the mechanism.
In the end, I think my concern is that we won’t get buy-in from a large majority of users: In order to accommodate some proportion with odd moral views it seems likely you’ll be throwing away huge amounts of expected value in others’ views—if I’m correctly interpreting your proposal (please correct me if I’m confused).
Is this where you’d want to apply amplified denosing? So, rather than filtering out the undesirable i, for these i you use:
(Diu)(xi,x−i,y):=maxx′∈∏j≠iXj,y′∈Yu(xi,x′,y′) [i.e. ignoring y and imagining it’s optimal]
However, it’s not clear to me how we’d decide who gets strong denosing (clearly not everyone, or we don’t pick a y). E.g. if you strong-denose anyone who’s too willing to allow bargaining failure [everyone dies] you might end up filtering out altruists who worry about suffering risks. Does that make sense?
My worry wasn’t about the initial 10%, but about the possibility of the process being iterated such that you end up with almost all bargaining power in the hands of power-keepers.
I’m not sure what you mean here, but also the process is not iterated: the initial bargaining is deciding the outcome once and for all. At least that’s the mathematical ideal we’re approximating.
In the end, I think my concern is that we won’t get buy-in from a large majority of users:
In order to accommodate some proportion with odd moral views it seems likely you’ll be throwing away huge amounts of expected value in others’ views
I don’t think so? The bargaining system does advantage large groups over small groups.
In practice, I think that for the most part people don’t care much about what happens “far” from them (for some definition of “far”, not physical distance) so giving them private utopias is close to optimal from each individual perspective. Although it’s true they might pretend to care more than they do for the usual reasons, if they’re thinking in “far-mode”.
I would certainly be very concerned about any system that gives even more power to majority views. For example, what if the majority of people are disgusted by gay sex and prefer it not the happen anywhere? I would rather accept things I disapprove of happening far away from me than allow other people to control my own life.
Ofc the system also mandates win-win exchanges. For example, if Alice’s and Bob’s private utopias each contain something strongly unpalatable to the other but not strongly important to the respective customer, the bargaining outcome will remove both unpalatable things.
E.g. if you strong-denose anyone who’s too willing to allow bargaining failure [everyone dies] you might end up filtering out altruists who worry about suffering risks.
I’m fine with strong-denosing negative utlitarianists who would truly stick to their guns about negative utilitarianism (but I also don’t think there are many).
Ah, I was just being an idiot on the bargaining system w.r.t. small numbers of people being able to hold it to ransom. Oops. Agreed that more majority power isn’t desirable. [re iteration, I only meant that the bargaining could become iterated if the initial bargaining result were to decide upon iteration (to include more future users). I now don’t think this is particularly significant.]
I think my remaining uncertainty (/confusion) is all related to the issue I first mentioned (embedded copy experiences). It strikes me that something like this can also happen where minds grow/merge/overlap.
This operator will declare both the manifesting and evaluation of the source codes of other users to be “out of scope” for a given user. Hence, a preference of i to observe the suffering of j would be “satisfied” by observing nearly anything, since the maximization can interpret anything as a simulation of j.
Does this avoid the problem if i’s preferences use indirection? It seems to me that a robust pointer to j may be enough: that with a robust pointer it may be possible to implicitly require something like source-code-access without explicitly referencing it. E.g. where i has a preference to “experience j suffering in circumstances where there’s strong evidence it’s actually j suffering, given that these circumstances were the outcome of this bargaining process”.
If i can’t robustly specify things like this, then I’d guess there’d be significant trouble in specifying quite a few (mutually) desirable situations involving other users too. IIUC, this would only be any problem for the denosed bargaining to find a good d1: for the second bargaining on the true utility functions there’s no need to put anything “out of scope” (right?), so win-wins are easily achieved.
Given some assumptions about the domains of the utility functions, it is possible to do better than what I described in the previous comment. Let Xi be the space of possible experience histories[1] of user i and Y the space of everything else the utility functions depend on (things that nobody can observe directly). Suppose that the domain of the utility functions is Z:=∏iXi×Y. Then, we can define the “denosing[2] operator” Di:C(Z)→C(Z) for user i by
(Diu)(xi,x−i,y):=maxx′∈∏j≠iXju(xi,x′,y)
Here, xi is the argument of u that ranges in Xi, x−i are the arguments that range in Xj for j≠i and y is the argument that ranges in Y.
That is, Di modifies a utility function by having it “imagine” that the experiences of all users other than i have been optimized, for the experiences of user i and the unobservables held constant.
Let ui:Z→R be the utility function of user i, and d0∈Rn the initial disagreement point (everyone dying), where n is the number of users. We then perform cooperative bargaining on the denosed utility functions Diui with disagreement point d0, producing some outcome μ0∈Δ(Z). Define d1∈Rn by d1i:=Eμ[ui]. Now we do another cooperative bargaining with d1 as the disagreement point and the original utility functions ui. This gives us the final outcome μ1.
Among other benefits, there is now much less need to remove outliers. Perhaps, instead of removing them we still want to mitigate them by applying “amplified denosing” to them which also removes the dependence on Y.
For this procedure, there is a much better case that the lower bound will be met.
In the standard RL formalism this is the space of action-observation sequences (A×O)ω.
From the expression “nosy preferences”, see e.g. here.
This is very interesting (and “denosing operator” is delightful).
Some thoughts:
If I understand correctly, I think there can still be a problem where user i wants an experience history such that part of the history is isomorphic to a simulation of user j suffering (i wants to fully experience j suffering in every detail).
Here a fixed xi may entail some fixed xj for (some copy of) some j.
It seems the above approach can’t then avoid leaving one of i or j badly off:
If i is permitted to freely determine the experience of the embedded j copy, the disagreement point in the second bargaining will bake this in: j may be horrified to see that i wants to experience its copy suffer, but will be powerless to stop it (if i won’t budge in the bargaining).
Conversely, if the embedded j is treated as a user which i will imagine is exactly to i’s liking, but who actually gets what j wants, then the selected μ0 will be horrible for i (e.g. perhaps i wants to fully experience Hitler suffering, and instead gets to fully experience Hitler’s wildest fantasies being realized).
I don’t think it’s possible to do anything like denosing to avoid this.
It may seem like this isn’t a practical problem, since we could reasonably disallow such embedding. However, I think that’s still tricky since there’s a less exotic version of the issue: my experiences likely already are a collection of subagents’ experiences. Presumably my maximisation over xjoe is permitted to determine all the xsubjoe.
It’s hard to see how you draw a principled line here: the ideal future for most people may easily be transhumanist to the point where today’s users are tomorrow’s subpersonalities (and beyond).
A case that may have to be ruled out separately is where i wants to become a suffering j. Depending on what I consider ‘me’, I might be entirely fine with it if ‘I’ wake up tomorrow as suffering j (if I’m done living and think j deserves to suffer).
Or perhaps I want to clone myself 1010 times, and then have all copies convert themselves to suffering js after a while. [in general, it seems there has to be some mechanism to distribute resources reasonably—but it’s not entirely clear what that should be]
I think that a rigorous treatment of such issues will require some variant of IB physicalism (in which the monotonicity problem has been solved, somehow). I am cautiously optimistic that a denosing operator exists there which dodges these problems. This operator will declare both the manifesting and evaluation of the source codes of other users to be “out of scope” for a given user. Hence, a preference of i to observe the suffering of j would be “satisfied” by observing nearly anything, since the maximization can interpret anything as a simulation of j.
The “subjoe” problem is different: it is irrelevant because “subjoe” is not a user, only Joe is a user. All the transhumanist magic that happens later doesn’t change this. Users are people living during the AI launch, and only them. The status of any future (trans/post)humans is determined entirely according to the utility functions of users. Why? For two reasons: (i) the AI can only have access and stable pointers to existing people (ii) we only need the buy-in of existing people to launch the AI. If existing people want future people to be treated well, then they have nothing to worry about since this preference is part of the existing people’s utility functions.
Ah—that’s cool if IB physicalism might address this kind of thing (still on my to-read list).
Agreed that the subjoe thing isn’t directly a problem. My worry is mainly whether it’s harder to rule out i experiencing a simulation of xsubj−suffering, since subj isn’t a user. However, if you can avoid the suffering js by limiting access to information, the same should presumably work for relevant sub-js.
This isn’t so clear (to me at least) if:
Most, but not all current users want future people to be treated well.
Part of being “treated well” includes being involved in an ongoing bargaining process which decides the AI’s/future’s trajectory.
For instance, suppose initially 90% of people would like to have an iterated bargaining process that includes future (trans/post)humans as users, once they exist. The other 10% are only willing to accept such a situation if they maintain their bargaining power in future iterations (by whatever mechanism).
If you iterate this process, the bargaining process ends up dominated by users who won’t relinquish any power to future users. 90% of initial users might prefer drift over lock-in, but we get lock-in regardless (the disagreement point also amounting to lock-in).
Unless I’m confusing myself, this kind of thing seems like a problem. (not in terms of reaching some non-terrible lower bound, but in terms of realising potential)
Wherever there’s this kind of asymmetry/degradation over bargaining iterations, I think there’s an argument for building in a way to avoid it from the start—since anything short of 100% just limits to 0 over time. [it’s by no means clear that we do want to make future people users on an equal footing to today’s people; it just seems to me that we have to do it at step zero or not at all]
I admit that at this stage it’s unclear because physicalism brings in the monotonicity principle that creates bigger problems than what we discuss here. But maybe some variant can work.
Roughly speaking, in this case the 10% preserve their 10% of the power forever. I think it’s fine because I want the buy-in of this 10% and the cost seems acceptable to me. I’m also not sure there is any viable alternative which doesn’t have even bigger problems.
Sure, I’m not sure there’s a viable alternative either. This kind of approach seems promising—but I want to better understand any downsides.
My worry wasn’t about the initial 10%, but about the possibility of the process being iterated such that you end up with almost all bargaining power in the hands of power-keepers.
In retrospect, this is probably silly: if there’s a designable-by-us mechanism that better achieves what we want, the first bargaining iteration should find it. If not, then what I’m gesturing at must either be incoherent, or not endorsed by the 10% - so hard-coding it into the initial mechanism wouldn’t get the buy-in of the 10% to the extent that they understood the mechanism.
In the end, I think my concern is that we won’t get buy-in from a large majority of users:
In order to accommodate some proportion with odd moral views it seems likely you’ll be throwing away huge amounts of expected value in others’ views—if I’m correctly interpreting your proposal (please correct me if I’m confused).
Is this where you’d want to apply amplified denosing?
So, rather than filtering out the undesirable i, for these i you use:
(Diu)(xi,x−i,y):=maxx′∈∏j≠iXj, y′∈Yu(xi,x′,y′) [i.e. ignoring y and imagining it’s optimal]
However, it’s not clear to me how we’d decide who gets strong denosing (clearly not everyone, or we don’t pick a y). E.g. if you strong-denose anyone who’s too willing to allow bargaining failure [everyone dies] you might end up filtering out altruists who worry about suffering risks.
Does that make sense?
I’m not sure what you mean here, but also the process is not iterated: the initial bargaining is deciding the outcome once and for all. At least that’s the mathematical ideal we’re approximating.
I don’t think so? The bargaining system does advantage large groups over small groups.
In practice, I think that for the most part people don’t care much about what happens “far” from them (for some definition of “far”, not physical distance) so giving them private utopias is close to optimal from each individual perspective. Although it’s true they might pretend to care more than they do for the usual reasons, if they’re thinking in “far-mode”.
I would certainly be very concerned about any system that gives even more power to majority views. For example, what if the majority of people are disgusted by gay sex and prefer it not the happen anywhere? I would rather accept things I disapprove of happening far away from me than allow other people to control my own life.
Ofc the system also mandates win-win exchanges. For example, if Alice’s and Bob’s private utopias each contain something strongly unpalatable to the other but not strongly important to the respective customer, the bargaining outcome will remove both unpalatable things.
I’m fine with strong-denosing negative utlitarianists who would truly stick to their guns about negative utilitarianism (but I also don’t think there are many).
Ah, I was just being an idiot on the bargaining system w.r.t. small numbers of people being able to hold it to ransom. Oops. Agreed that more majority power isn’t desirable.
[re iteration, I only meant that the bargaining could become iterated if the initial bargaining result were to decide upon iteration (to include more future users). I now don’t think this is particularly significant.]
I think my remaining uncertainty (/confusion) is all related to the issue I first mentioned (embedded copy experiences). It strikes me that something like this can also happen where minds grow/merge/overlap.
Does this avoid the problem if i’s preferences use indirection? It seems to me that a robust pointer to j may be enough: that with a robust pointer it may be possible to implicitly require something like source-code-access without explicitly referencing it. E.g. where i has a preference to “experience j suffering in circumstances where there’s strong evidence it’s actually j suffering, given that these circumstances were the outcome of this bargaining process”.
If i can’t robustly specify things like this, then I’d guess there’d be significant trouble in specifying quite a few (mutually) desirable situations involving other users too. IIUC, this would only be any problem for the denosed bargaining to find a good d1: for the second bargaining on the true utility functions there’s no need to put anything “out of scope” (right?), so win-wins are easily achieved.