Yes, I still prefer this (assuming my own private utopia) over paperclips.
For a utilitarian, this doesn’t mean much. What’s much more important is something like, “How close is this outcome to an actual (global) utopia (e.g., with optimized utilitronium filling the universe), on a linear scale?” For example, my rough expectation (without having thought about it much) is that your “lower bound” outcome is about midway between paperclips and actual utopia on a logarithmic scale. In one sense, this is much better than paperclips, but in another sense (i.e., on the linear scale), it’s almost indistinguishable from paperclips, and a utilitarian would only care about the latter and therefore be nearly as disappointed by that outcome as paperclips.
I want to add a little to my stance on utilitarianism. A utilitarian superintelligence would probably kill me and everyone I love, because we are made of atoms that could be used for minds that are more hedonic[1][2][3]. Given a choice between paperclips and utilitarianism, I would still choose utilitarianism. But, if there was a utilitarian TAI project along with a half-decent chance to do something better (by my lights), I would actively oppose the utilitarian project. From my perspective, such a project is essentially enemy combatants.
One way to avoid it is by modifying utilitarianism to only place weight on currently existing people. But this is already not that far from my cooperative bargaining proposal (although still inferior to it, IMO).
Another way to avoid it is by postulating some very strong penalty on death (i.e. discontinuity of personality). But this is not trivial to do, especially without creating other problems. Moreover, from my perspective this kind of thing is hacks trying to work around the core issue, namely that I am not a utilitarian (along with the vast majority of people).
A possible counterargument is, maybe the superhedonic future minds would be sad to contemplate our murder. But, this seems too weak to change the outcome, even assuming that this version of utilitarianism mandates minds who would want to know the truth and care about it, and that this preference is counted towards “utility”.
A utilitarian superintelligence would probably kill me and everyone I love, because we are made of atoms that could be used for minds that are more hedonic
This seems like a reasonable concern about some types of hedonic utilitarianism. To be clear, I’m not aware of any formulation of utilitarianism that doesn’t have serious issues, and I’m also not aware of any formulation of any morality that doesn’t have serious issues.
But, if there was a utilitarian TAI project along with a half-decent chance to do something better (by my lights), I would actively oppose the utilitarian project. From my perspective, such a project is essentially enemy combatants.
Just to be clear, this isn’t in response to something I wrote, right? (I’m definitely not advocating any kind of “utilitarian TAI project” and would be quite scared of such a project myself.)
Moreover, from my perspective this kind of thing is hacks trying to work around the core issue, namely that I am not a utilitarian (along with the vast majority of people).
So what are you (and them) then? What would your utopia look like?
Just to be clear, this isn’t in response to something I wrote, right? (I’m definitely not advocating any kind of “utilitarian TAI project” and would be quite scared of such a project myself.)
No! Sorry, if I gave that impression.
So what are you (and them) then? What would your utopia look like?
Well, I linked my toy model of partiality before. Are you asking about something more concrete?
I have low confidence about this, but my best guess personal utopia would be something like: A lot of cool and interesting things are happening. Some of them are good, some of them are bad (a world in which nothing bad ever happens would be boring). However, there is a limit on how bad something is allowed to be (for example, true death, permanent crippling of someone’s mind and eternal torture are over the line), and overall “happy endings” are more common than “unhappy endings”. Moreover, since it’s my utopia (according to my understanding of the question, we are ignoring the bargaining process and acausal cooperation here), I am among the top along those desirable dimensions which are zero-sum (e.g. play an especially important / “protagonist” role in the events to the extent that it’s impossible for everyone to play such an important role, and have high status to the extent that it’s impossible for everyone to have such high status).
First, you wrote “a part of me is actually more scared of many futures in which alignment is solved, than a future where biological life is simply wiped out by a paperclip maximizer.” So, I tried to assuage this fear for a particular class of alignment solutions.
Second… Yes, for a utilitarian this doesn’t mean “much”. But, tbh, who cares? I am not a utilitarian. The vast majority of people are not utilitarians. Maybe even literally no one is an (honest, not self-deceiving) utilitarian. From my perspective, disappointing the imaginary utilitarian is (in itself) about as upsetting as disappointing the imaginary paperclip maximizer.
Third, what I actually want from multi-user alignment is a solution that (i) is acceptable to me personally (ii) is acceptable to the vast majority of people (at least if they think through it rationally and are arguing honestly and in good faith) (iii) is acceptable to key stakeholders (iv) as much as possible, doesn’t leave any Pareto improvements on the table and (v) sufficiently Schelling-pointy to coordinate around. Here, “acceptable” means “a lot better than paperclips and not worth starting an AI race/war to get something better”.
Second… Yes, for a utilitarian this doesn’t mean “much”. But, tbh, who cares? I am not a utilitarian. The vast majority of people are not utilitarians. Maybe even literally no one is an (honest, not self-deceiving) utilitarian. From my perspective, disappointing the imaginary utilitarian is (in itself) about as upsetting as disappointing the imaginary paperclip maximizer.
I’m not a utilitarian either, because I don’t know what my values are or should be. But I do assign significant credence to the possibility that something in the vincinity of utilitarianism is the right values (for me, or period). Given my uncertainties, I want to arrange the current state of the world so that (to the extent possible), whatever I end up deciding my values are, through things like reason, deliberation, doing philosophy, the world will ultimately not turn out to be a huge disappointment according to those values. Unfortunately, your proposed solution isn’t very reassuring to this kind of view.
It’s quite possible that I (and people like me) are simply out of luck, and there’s just no feasible way to do what we want to do, but it sounds like you think I shouldn’t even want what I want, or at least that you don’t want something like this. Is it because you’re already pretty sure what your values are or should be, and therefore think there’s little chance that millennia from now you’ll end up deciding that utilitarianism (or NU, or whatever) is right after all, and regret not doing more in 2021 to push the world in the direction of [your real values, whatever they are]?
I’m moderately sure what my values are, to some approximation. More importantly, I’m even more sure that, whatever my values are, they are not so extremely different from the values of most people that I should wage some kind of war against the majority instead of trying to arrive at a reasonable compromise. And, in the unlikely event that most people (including me) will turn out to be some kind of utilitarians after all, it’s not a problem: value aggregation will then produce a universe which is pretty good for utilitarians.
I’m moderately sure what my values are, to some approximation. More importantly, I’m even more sure that, whatever my values are, they are not so extremely different from the values of most people [...]
Maybe you’re just not part of the target audience of my OP then… but from my perspective, if I determine my values through the kind of process described in the first quote, and most people determine their values through the kind of process described in the second quote, it seems quite likely that the values end up being very different.
[...] that I should wage some kind of war against the majority instead of trying to arrive at a reasonable compromise.
The kind of solution I have in mind is not “waging war” but for example, solving metaphilososphy and building an AI that can encourage philosophical reflection in humans or enhance people’s philosophical abilities.
And, in the unlikely possibility that most people (including me) will turn out to be some kind of utilitarians after all, it’s not a problem: value aggregation will then produce a universe which is pretty good for utilitarians.
What if you turn out to be some kind of utilitarian but most people don’t (because you’re more like the first group in the OP and they’re more like the second group), or most people will eventually turn out to be some kind of utilitarian in a world without AI, but in a world with AI, this will happen?
I don’t think people determine their values through either process. I think that they already have values, which are to a large extent genetic and immutable. Instead, these processes determine what values they pretend to have for game-theory reasons. So, the big difference between the groups is which “cards” they hold and/or what strategy they pursue, not an intrinsic difference in values.
But also, if we do model values as the result of some long process of reflection, and you’re worried about the AI disrupting or insufficiently aiding this process, then this is already a single-user alignment issue and should be analyzed in that context first. The presumed differences in moralities are not the main source of the problem here.
I don’t think people determine their values through either process. I think that they already have values, which are to a large extent genetic and immutable. Instead, these processes determine what values they pretend to have for game-theory reasons. So, the big difference between the groups is which “cards” they hold and/or what strategy they pursue, not an intrinsic difference in values.
This is not a theory that’s familiar to me. Why do you think this is true? Have you written more about it somewhere or can link to a more complete explanation?
But also, if we do model values as the result of some long process of reflection, and you’re worried about the AI disrupting or insufficiently aiding this process, then this is already a single-user alignment issue and should be analyzed in that context first. The presumed differences in moralities are not the main source of the problem here.
This seems reasonable to me. (If this was meant to be an argument against something I said, there may have been anther miscommuncation, but I’m not sure it’s worth tracking that down.)
This is not a theory that’s familiar to me. Why do you think this is true? Have you written more about it somewhere or can link to a more complete explanation?
I considering writing about this for a while, but so far I don’t feel sufficiently motivated. So, the links I posted upwards in the thread are the best I have, plus vague gesturing in the directions of Hansonian signaling theories, Jaynes’ theory of consciousness and Yudkowsky’s belief in belief.
This comment seems to be consistent with the assumption that the outcome 1 year after the singularity is locked in forever. But the future we’re discussing here is one where humans retain autonomy (?), and in that case, they’re allowed to change their mind over time, especially if humanity has access to a superintelligent aligned AI. I think a future where we begin with highly suboptimal personal utopias and gradually transition into utilitronium is among the more plausible outcomes. Compared with other outcomes where Not Everyone Dies, anyway. Your credence may differ if you’re a moral relativist.
But the future we’re discussing here is one where humans retain autonomy (?), and in that case, they’re allowed to change their mind over time, especially if humanity has access to a superintelligent aligned AI.
What if the humans ask the aligned AI to help them be more moral, and part of what they mean by “more moral” is having fewer doubts about their current moral beliefs? This is what a “status game” view of morality seems to predict, for the humans whose status games aren’t based on “doing philosophy”, which seems to be most of them.
I don’t have any reason why this couldn’t happen. My position is something like “morality is real, probably precisely quantifiable; seems plausible that in the scenario of humans with autonomy and aligned AI, this could lead to an asymmetry where more people tend toward utilitronium over time”. (Hence why I replied, you didn’t seem to consider that possibility.) I could make up some mechanisms for this, but probably you don’t need me for that. Also seems plausible that this doesn’t happen. If it doesn’t happen, maybe the people who get to decide what happens with the rest of the universe tend toward utilitronium. But my model is widely uncertain and doesn’t rule out futures of highly suboptimal personal utopias that persist indefinitely.
I strongly believe that (1) well-being is objective, (2) well-being is quantifiable, and (3) Open Individualism is true (i.e., the concept of identity isn’t well-defined, and you’re subjectively no less continuous with the future self if any other person than your own future self).
If (1-3) are all true, then utilitronium is the optimal outcome for everyone even if they’re entirely selfish. Furthermore, I expect an AGI to figure this out, and to the extent that it’s aligned, it should communicate that if it’s asked. (I don’t think an AGI will therefore decide to do the right thing, so this is entirely compatible with everyone dying if alignment isn’t solved.)
In the scenario where people get to talk to the AGI freely and it’s aligned, two concrete mechanisms I see are (a) people just ask the AGI what is morally correct and it tells them, and (b) they get some small taste of what utilitronium would feel like, which would make it less scary. (A crucial piece is that they can rationally expect to experience this themselves in the utilitronium future.)
In the scenario where people don’t get to talk to the AGI, who knows. It’s certainly possible that we have singleton scenario with a few people in charge of the AGI, and they decide to censor questions about ethics because they find the answers scary.
The only org I know of that works on this and shares my philosophical views is QRI. Their goal is to (a) come up with a mathematical space (probably a topological one, mb a Hilbert space) that precisely describes the subjective experience of someone, (b) find a way to put someone in the scanner and create that space, and (c) find a property of that space that corresponds to their well-being in that moment. The flag ship theory is that this property is symmetry. Their model is stronger than (1-3), but if it’s correct, you could get hard evidence on this before AGI since it would make strong testable predictions about people’s well-being (and they think it could also point to easy interventions, though I don’t understand how that works). Whether it’s feasible to do this before AGI is a different question. I’d bet against it, but I think I give it better odds than any specific alignment proposal. (And I happen to know that Mike agrees that the future is dominated by concerns about AI and thinks this is the best thing to work on.)
So, I think their research is the best bet for getting more people on board with utilitronium since it can provide evidence on (1) and (2). (Also has the nice property that it won’t work if (1) or (2) are false, so there’s low risk of outrage.) Other than that, write posts arguing for moral realism and/or for Open Individualism.
Quantifying suffering before AGI would also plausibly help with alignment, since at least you can formally specify a broad space of outcomes you don’t want. though it certainly doesn’t solve it, e.g. because of inner optimizers.
For a utilitarian, this doesn’t mean much. What’s much more important is something like, “How close is this outcome to an actual (global) utopia (e.g., with optimized utilitronium filling the universe), on a linear scale?” For example, my rough expectation (without having thought about it much) is that your “lower bound” outcome is about midway between paperclips and actual utopia on a logarithmic scale. In one sense, this is much better than paperclips, but in another sense (i.e., on the linear scale), it’s almost indistinguishable from paperclips, and a utilitarian would only care about the latter and therefore be nearly as disappointed by that outcome as paperclips.
I want to add a little to my stance on utilitarianism. A utilitarian superintelligence would probably kill me and everyone I love, because we are made of atoms that could be used for minds that are more hedonic[1][2][3]. Given a choice between paperclips and utilitarianism, I would still choose utilitarianism. But, if there was a utilitarian TAI project along with a half-decent chance to do something better (by my lights), I would actively oppose the utilitarian project. From my perspective, such a project is essentially enemy combatants.
One way to avoid it is by modifying utilitarianism to only place weight on currently existing people. But this is already not that far from my cooperative bargaining proposal (although still inferior to it, IMO).
Another way to avoid it is by postulating some very strong penalty on death (i.e. discontinuity of personality). But this is not trivial to do, especially without creating other problems. Moreover, from my perspective this kind of thing is hacks trying to work around the core issue, namely that I am not a utilitarian (along with the vast majority of people).
A possible counterargument is, maybe the superhedonic future minds would be sad to contemplate our murder. But, this seems too weak to change the outcome, even assuming that this version of utilitarianism mandates minds who would want to know the truth and care about it, and that this preference is counted towards “utility”.
This seems like a reasonable concern about some types of hedonic utilitarianism. To be clear, I’m not aware of any formulation of utilitarianism that doesn’t have serious issues, and I’m also not aware of any formulation of any morality that doesn’t have serious issues.
Just to be clear, this isn’t in response to something I wrote, right? (I’m definitely not advocating any kind of “utilitarian TAI project” and would be quite scared of such a project myself.)
So what are you (and them) then? What would your utopia look like?
No! Sorry, if I gave that impression.
Well, I linked my toy model of partiality before. Are you asking about something more concrete?
Yeah, I mean aside from how much you care about various other people, what concrete things do you want in your utopia?
I have low confidence about this, but my best guess personal utopia would be something like: A lot of cool and interesting things are happening. Some of them are good, some of them are bad (a world in which nothing bad ever happens would be boring). However, there is a limit on how bad something is allowed to be (for example, true death, permanent crippling of someone’s mind and eternal torture are over the line), and overall “happy endings” are more common than “unhappy endings”. Moreover, since it’s my utopia (according to my understanding of the question, we are ignoring the bargaining process and acausal cooperation here), I am among the top along those desirable dimensions which are zero-sum (e.g. play an especially important / “protagonist” role in the events to the extent that it’s impossible for everyone to play such an important role, and have high status to the extent that it’s impossible for everyone to have such high status).
First, you wrote “a part of me is actually more scared of many futures in which alignment is solved, than a future where biological life is simply wiped out by a paperclip maximizer.” So, I tried to assuage this fear for a particular class of alignment solutions.
Second… Yes, for a utilitarian this doesn’t mean “much”. But, tbh, who cares? I am not a utilitarian. The vast majority of people are not utilitarians. Maybe even literally no one is an (honest, not self-deceiving) utilitarian. From my perspective, disappointing the imaginary utilitarian is (in itself) about as upsetting as disappointing the imaginary paperclip maximizer.
Third, what I actually want from multi-user alignment is a solution that (i) is acceptable to me personally (ii) is acceptable to the vast majority of people (at least if they think through it rationally and are arguing honestly and in good faith) (iii) is acceptable to key stakeholders (iv) as much as possible, doesn’t leave any Pareto improvements on the table and (v) sufficiently Schelling-pointy to coordinate around. Here, “acceptable” means “a lot better than paperclips and not worth starting an AI race/war to get something better”.
I’m not a utilitarian either, because I don’t know what my values are or should be. But I do assign significant credence to the possibility that something in the vincinity of utilitarianism is the right values (for me, or period). Given my uncertainties, I want to arrange the current state of the world so that (to the extent possible), whatever I end up deciding my values are, through things like reason, deliberation, doing philosophy, the world will ultimately not turn out to be a huge disappointment according to those values. Unfortunately, your proposed solution isn’t very reassuring to this kind of view.
It’s quite possible that I (and people like me) are simply out of luck, and there’s just no feasible way to do what we want to do, but it sounds like you think I shouldn’t even want what I want, or at least that you don’t want something like this. Is it because you’re already pretty sure what your values are or should be, and therefore think there’s little chance that millennia from now you’ll end up deciding that utilitarianism (or NU, or whatever) is right after all, and regret not doing more in 2021 to push the world in the direction of [your real values, whatever they are]?
I’m moderately sure what my values are, to some approximation. More importantly, I’m even more sure that, whatever my values are, they are not so extremely different from the values of most people that I should wage some kind of war against the majority instead of trying to arrive at a reasonable compromise. And, in the unlikely event that most people (including me) will turn out to be some kind of utilitarians after all, it’s not a problem: value aggregation will then produce a universe which is pretty good for utilitarians.
Maybe you’re just not part of the target audience of my OP then… but from my perspective, if I determine my values through the kind of process described in the first quote, and most people determine their values through the kind of process described in the second quote, it seems quite likely that the values end up being very different.
The kind of solution I have in mind is not “waging war” but for example, solving metaphilososphy and building an AI that can encourage philosophical reflection in humans or enhance people’s philosophical abilities.
What if you turn out to be some kind of utilitarian but most people don’t (because you’re more like the first group in the OP and they’re more like the second group), or most people will eventually turn out to be some kind of utilitarian in a world without AI, but in a world with AI, this will happen?
I don’t think people determine their values through either process. I think that they already have values, which are to a large extent genetic and immutable. Instead, these processes determine what values they pretend to have for game-theory reasons. So, the big difference between the groups is which “cards” they hold and/or what strategy they pursue, not an intrinsic difference in values.
But also, if we do model values as the result of some long process of reflection, and you’re worried about the AI disrupting or insufficiently aiding this process, then this is already a single-user alignment issue and should be analyzed in that context first. The presumed differences in moralities are not the main source of the problem here.
This is not a theory that’s familiar to me. Why do you think this is true? Have you written more about it somewhere or can link to a more complete explanation?
This seems reasonable to me. (If this was meant to be an argument against something I said, there may have been anther miscommuncation, but I’m not sure it’s worth tracking that down.)
I considering writing about this for a while, but so far I don’t feel sufficiently motivated. So, the links I posted upwards in the thread are the best I have, plus vague gesturing in the directions of Hansonian signaling theories, Jaynes’ theory of consciousness and Yudkowsky’s belief in belief.
Isn’t this the main thesis of “The righteous mind”?
This comment seems to be consistent with the assumption that the outcome 1 year after the singularity is locked in forever. But the future we’re discussing here is one where humans retain autonomy (?), and in that case, they’re allowed to change their mind over time, especially if humanity has access to a superintelligent aligned AI. I think a future where we begin with highly suboptimal personal utopias and gradually transition into utilitronium is among the more plausible outcomes. Compared with other outcomes where Not Everyone Dies, anyway. Your credence may differ if you’re a moral relativist.
What if the humans ask the aligned AI to help them be more moral, and part of what they mean by “more moral” is having fewer doubts about their current moral beliefs? This is what a “status game” view of morality seems to predict, for the humans whose status games aren’t based on “doing philosophy”, which seems to be most of them.
I don’t have any reason why this couldn’t happen. My position is something like “morality is real, probably precisely quantifiable; seems plausible that in the scenario of humans with autonomy and aligned AI, this could lead to an asymmetry where more people tend toward utilitronium over time”. (Hence why I replied, you didn’t seem to consider that possibility.) I could make up some mechanisms for this, but probably you don’t need me for that. Also seems plausible that this doesn’t happen. If it doesn’t happen, maybe the people who get to decide what happens with the rest of the universe tend toward utilitronium. But my model is widely uncertain and doesn’t rule out futures of highly suboptimal personal utopias that persist indefinitely.
I’m interested in your view on this, plus what we can potentially do to push the future in this direction.
I strongly believe that (1) well-being is objective, (2) well-being is quantifiable, and (3) Open Individualism is true (i.e., the concept of identity isn’t well-defined, and you’re subjectively no less continuous with the future self if any other person than your own future self).
If (1-3) are all true, then utilitronium is the optimal outcome for everyone even if they’re entirely selfish. Furthermore, I expect an AGI to figure this out, and to the extent that it’s aligned, it should communicate that if it’s asked. (I don’t think an AGI will therefore decide to do the right thing, so this is entirely compatible with everyone dying if alignment isn’t solved.)
In the scenario where people get to talk to the AGI freely and it’s aligned, two concrete mechanisms I see are (a) people just ask the AGI what is morally correct and it tells them, and (b) they get some small taste of what utilitronium would feel like, which would make it less scary. (A crucial piece is that they can rationally expect to experience this themselves in the utilitronium future.)
In the scenario where people don’t get to talk to the AGI, who knows. It’s certainly possible that we have singleton scenario with a few people in charge of the AGI, and they decide to censor questions about ethics because they find the answers scary.
The only org I know of that works on this and shares my philosophical views is QRI. Their goal is to (a) come up with a mathematical space (probably a topological one, mb a Hilbert space) that precisely describes the subjective experience of someone, (b) find a way to put someone in the scanner and create that space, and (c) find a property of that space that corresponds to their well-being in that moment. The flag ship theory is that this property is symmetry. Their model is stronger than (1-3), but if it’s correct, you could get hard evidence on this before AGI since it would make strong testable predictions about people’s well-being (and they think it could also point to easy interventions, though I don’t understand how that works). Whether it’s feasible to do this before AGI is a different question. I’d bet against it, but I think I give it better odds than any specific alignment proposal. (And I happen to know that Mike agrees that the future is dominated by concerns about AI and thinks this is the best thing to work on.)
So, I think their research is the best bet for getting more people on board with utilitronium since it can provide evidence on (1) and (2). (Also has the nice property that it won’t work if (1) or (2) are false, so there’s low risk of outrage.) Other than that, write posts arguing for moral realism and/or for Open Individualism.
Quantifying suffering before AGI would also plausibly help with alignment, since at least you can formally specify a broad space of outcomes you don’t want. though it certainly doesn’t solve it, e.g. because of inner optimizers.