I disagree with this but am happy your position is laid out. I’ll just try to give my overall understanding and reply to two points.
Like Oliver, it seems like you are implying:
Humans may be nice to other creatures in some sense, But if the fish were to look at the future that we’d achieve for them using the 1/billionth of resources we spent on helping them, it would be as objectionable to them as “murder everyone” is to us.
I think that normal people being pseudokind in a common-sensical way would instead say:
If we are trying to help some creatures, but those creatures really dislike the proposed way we are “helping” them, then we should try a different tactic for helping them.
I think that some utilitarians (without reflection) plausibly would “help the humans” in a way that most humans consider as bad as being murdered. But I think this is an unusual feature of utilitarians, and most people would consult the beneficiaries, observe they don’t want to be murdered, and so not murder them.
I think that saying “Helping someone in a way they like, sufficiently precisely to avoid things like murdering them, requires precisely the right form of caring—and that’s super rare” is a really misleading sense of how values work and what targets are narrow. I think this is more obvious if you are talking about how humans would treat a weaker species. If that’s the state of the disagreement I’m happy to leave it there.
I’m somewhat persuaded by the claim that failing to mention even the possibility of having your brainstate stored, and then run-and-warped by an AI or aliens or whatever later, or run in an alien zoo later, is potentially misleading.
This is an important distinction at 1/trillion levels of kindness, but at 1/billion levels of kindness I don’t even think the humans have to die.
If we are trying to help some creatures, but those creatures really dislike the proposed way we are “helping” them, then we should do something else.
My picture is less like “the creatures really dislike the proposed help”, and more like “the creatures don’t have terribly consistent preferences, and endorse each step of the chain, and wind up somewhere that they wouldn’t have endorsed if you first extrapolated their volition (but nobody’s extrapolating their volition or checking against that)”.
It sounds to me like your stance is something like “there’s a decent chance that most practically-buildable minds pico-care about correctly extrapolating the volition of various weak agents and fulfilling that extrapolated volition”, which I am much more skeptical of than the weaker “most practically-buildable minds pico-care about satisfying the preferences of weak agents in some sense”.
We’re not talking about practically building minds right now, we are talking about humans.
We’re not talking about “extrapolating volition” in general. We are talking about whether—in attempting to help a creature with preferences about as coherent as human preferences—you end up implementing an outcome that creature considers as bad as death.
For example, we are talking about what would happen if humans were trying to be kind to a weaker species that they had no reason to kill, that could nevertheless communicate clearly and had preferences about as coherent as human preferences (while being very alien).
And those creatures are having a conversation amongst themselves before the humans arrive wondering “Are the humans going to murder us all?” And one of them is saying “I don’t know, they don’t actually benefit from murdering us and they seem to care a tiny bit about being nice, maybe they’ll just let us do our thing with 1/trillionth of the universe’s resources?” while another is saying “They will definitely have strong opinions about what our society should look like and the kind of transformation they implement is about as bad by our lights as being murdered.”
In practice attempts to respect someone’s preferences often involve ideas like autonomy and self-determination and respect for their local preferences. I really don’t think you have to go all the way to extrapolated volition in order to avoid killing everyone.
Humans wound up caring at least a little about satisfying the preferences of other creatures, not in a “grant their local wishes even if that ruins them” sort of way but in some other intuitively-reasonable manner.
Humans are the only minds we’ve seen so far, and so having seen this once, maybe we start with a 50%-or-so chance that it will happen again.
You can then maybe drive this down a fair bit by arguing about how the content looks contingent on the particulars of how humans developed or whatever, and maybe that can drive you down to 10%, but it shouldn’t be able to drive you down to 0.1%, especially not if we’re talking only about incredibly weak preferences.
If so, one guess is that a bunch of disagreement lurks in this “intuitively-reasonable manner” business.
A possible locus of disagreemet: it looks to me like, if you give humans power before you give them wisdom, it’s pretty easy to wreck them while simply fulfilling their preferences. (Ex: lots of teens have dumbass philosophies, and might be dumb enough to permanently commit to them if given that power.)
More generally, I think that if mere-humans met very-alien minds with similarly-coherent preferences, and if the humans had the opportunity to magically fulfil certain alien preferences within some resource-budget, my guess is that the humans would have a pretty hard time offering power and wisdom in the right ways such that this overall went well for the aliens by their own lights (as extrapolated at the beginning), at least without some sort of volition-extrapolation.
(I separately expect that if we were doing something more like the volition-extrapolation thing, we’d be tempted to bend the process towards “and they learn the meaning of friendship”.)
That said, this conversation is updating me somewhat towards “a random UFAI would keep existing humans around and warp them in some direction it prefers, rather than killing them”, on the grounds that the argument “maybe preferences-about-existing-agents is just a common way for rando drives to shake out” plausibly supports it to a threshold of at least 1 in 1000. I’m not sure where I’ll end up on that front.
Another attempt at naming a crux: It looks to me like you see this human-style caring about others’ preferences as particularly “simple” or “natural”, in a way that undermines “drawing a target around the bullseye”-type arguments, whereas I could see that argument working for “grant all their wishes (within a budget)” but am much more skeptical when it comes to “do right by them in an intuitively-reasonable way”.
(But that still leaves room for an update towards “the AI doesn’t necessarily kill us, it might merely warp us, or otherwise wreck civilization by bounding us and then giving us power-before-wisdom within those bounds or or suchlike, as might be the sort of whims that rando drives shake out into”, which I’ll chew on.)
More generally, I think that if mere-humans met very-alien minds with similarly-coherent preferences, and if the humans had the opportunity to magically fulfill certain alien preferences within some resource-budget, my guess is that the humans would have a pretty hard time offering power and wisdom in the right ways such that this overall went well for the aliens by their own lights (as extrapolated at the beginning), at least without some sort of volition-extrapolation.
Isn’t the worst case scenario just leaving the aliens alone? If I’m worried I’m going to fuck up some alien’s preferences, I’m just not going to give them any power or wisdom!
I guess you think we’re likely to fuck up the alien’s preferences by light of their reflection process, but not our reflection process. But this just recurs to the meta level. If I really do care about an alien’s preferences (as it feels like I do), why can’t I also care about their reflection process (which is just a meta preference)?
I feel like the meta level at which I no longer care about doing right by an alien is basically the meta level at which I stop caring about someone doing right by me. In fact, this is exactly how it seems mentally constructed: what I mean by “doing right by [person]” is “what that person would mean by ‘doing right by me’”. This seems like either something as simple as it naively looks, or sensitive to weird hyperparameters I’m not sure I care about anyway.
(But that still leaves room for an update towards “the AI doesn’t necessarily kill us, it might merely warp us, or otherwise wreck civilization by bounding us and then giving us power-before-wisdom within those bounds or or suchlike, as might be the sort of whims that rando drives shake out into”, which I’ll chew on.)
FWIW this is my view. (Assuming no ECL/MSR or acausal trade or other such stuff. If we add those things in, the situation gets somewhat better in expectation I think, because there’ll be trades with faraway places that DO care about our CEV.)
My reading of the argument was something like “bullseye-target arguments refute an artificially privileged target being rated significantly likely under ignorance, e.g. the probability that random aliens will eat ice cream is not 50%. But something like kindness-in-the-relevant-sense is the universal problem faced by all evolved species creating AGI, and is thus not so artificially privileged, and as a yes-no question about which we are ignorant the uniform prior assigns 50%”. It was more about the hypothesis not being artificially privileged by path-dependent concerns than the notion being particularly simple, per se.
I disagree with this but am happy your position is laid out. I’ll just try to give my overall understanding and reply to two points.
Like Oliver, it seems like you are implying:
I think that normal people being pseudokind in a common-sensical way would instead say:
I think that some utilitarians (without reflection) plausibly would “help the humans” in a way that most humans consider as bad as being murdered. But I think this is an unusual feature of utilitarians, and most people would consult the beneficiaries, observe they don’t want to be murdered, and so not murder them.
I think that saying “Helping someone in a way they like, sufficiently precisely to avoid things like murdering them, requires precisely the right form of caring—and that’s super rare” is a really misleading sense of how values work and what targets are narrow. I think this is more obvious if you are talking about how humans would treat a weaker species. If that’s the state of the disagreement I’m happy to leave it there.
This is an important distinction at 1/trillion levels of kindness, but at 1/billion levels of kindness I don’t even think the humans have to die.
My picture is less like “the creatures really dislike the proposed help”, and more like “the creatures don’t have terribly consistent preferences, and endorse each step of the chain, and wind up somewhere that they wouldn’t have endorsed if you first extrapolated their volition (but nobody’s extrapolating their volition or checking against that)”.
It sounds to me like your stance is something like “there’s a decent chance that most practically-buildable minds pico-care about correctly extrapolating the volition of various weak agents and fulfilling that extrapolated volition”, which I am much more skeptical of than the weaker “most practically-buildable minds pico-care about satisfying the preferences of weak agents in some sense”.
We’re not talking about practically building minds right now, we are talking about humans.
We’re not talking about “extrapolating volition” in general. We are talking about whether—in attempting to help a creature with preferences about as coherent as human preferences—you end up implementing an outcome that creature considers as bad as death.
For example, we are talking about what would happen if humans were trying to be kind to a weaker species that they had no reason to kill, that could nevertheless communicate clearly and had preferences about as coherent as human preferences (while being very alien).
And those creatures are having a conversation amongst themselves before the humans arrive wondering “Are the humans going to murder us all?” And one of them is saying “I don’t know, they don’t actually benefit from murdering us and they seem to care a tiny bit about being nice, maybe they’ll just let us do our thing with 1/trillionth of the universe’s resources?” while another is saying “They will definitely have strong opinions about what our society should look like and the kind of transformation they implement is about as bad by our lights as being murdered.”
In practice attempts to respect someone’s preferences often involve ideas like autonomy and self-determination and respect for their local preferences. I really don’t think you have to go all the way to extrapolated volition in order to avoid killing everyone.
Is this a reasonable paraphrase of your argument?
If so, one guess is that a bunch of disagreement lurks in this “intuitively-reasonable manner” business.
A possible locus of disagreemet: it looks to me like, if you give humans power before you give them wisdom, it’s pretty easy to wreck them while simply fulfilling their preferences. (Ex: lots of teens have dumbass philosophies, and might be dumb enough to permanently commit to them if given that power.)
More generally, I think that if mere-humans met very-alien minds with similarly-coherent preferences, and if the humans had the opportunity to magically fulfil certain alien preferences within some resource-budget, my guess is that the humans would have a pretty hard time offering power and wisdom in the right ways such that this overall went well for the aliens by their own lights (as extrapolated at the beginning), at least without some sort of volition-extrapolation.
(I separately expect that if we were doing something more like the volition-extrapolation thing, we’d be tempted to bend the process towards “and they learn the meaning of friendship”.)
That said, this conversation is updating me somewhat towards “a random UFAI would keep existing humans around and warp them in some direction it prefers, rather than killing them”, on the grounds that the argument “maybe preferences-about-existing-agents is just a common way for rando drives to shake out” plausibly supports it to a threshold of at least 1 in 1000. I’m not sure where I’ll end up on that front.
Another attempt at naming a crux: It looks to me like you see this human-style caring about others’ preferences as particularly “simple” or “natural”, in a way that undermines “drawing a target around the bullseye”-type arguments, whereas I could see that argument working for “grant all their wishes (within a budget)” but am much more skeptical when it comes to “do right by them in an intuitively-reasonable way”.
(But that still leaves room for an update towards “the AI doesn’t necessarily kill us, it might merely warp us, or otherwise wreck civilization by bounding us and then giving us power-before-wisdom within those bounds or or suchlike, as might be the sort of whims that rando drives shake out into”, which I’ll chew on.)
Isn’t the worst case scenario just leaving the aliens alone? If I’m worried I’m going to fuck up some alien’s preferences, I’m just not going to give them any power or wisdom!
I guess you think we’re likely to fuck up the alien’s preferences by light of their reflection process, but not our reflection process. But this just recurs to the meta level. If I really do care about an alien’s preferences (as it feels like I do), why can’t I also care about their reflection process (which is just a meta preference)?
I feel like the meta level at which I no longer care about doing right by an alien is basically the meta level at which I stop caring about someone doing right by me. In fact, this is exactly how it seems mentally constructed: what I mean by “doing right by [person]” is “what that person would mean by ‘doing right by me’”. This seems like either something as simple as it naively looks, or sensitive to weird hyperparameters I’m not sure I care about anyway.
FWIW this is my view. (Assuming no ECL/MSR or acausal trade or other such stuff. If we add those things in, the situation gets somewhat better in expectation I think, because there’ll be trades with faraway places that DO care about our CEV.)
My reading of the argument was something like “bullseye-target arguments refute an artificially privileged target being rated significantly likely under ignorance, e.g. the probability that random aliens will eat ice cream is not 50%. But something like kindness-in-the-relevant-sense is the universal problem faced by all evolved species creating AGI, and is thus not so artificially privileged, and as a yes-no question about which we are ignorant the uniform prior assigns 50%”. It was more about the hypothesis not being artificially privileged by path-dependent concerns than the notion being particularly simple, per se.