since they too can be positively or negatively affected in morally relevant ways
taboo morality.
So people want X
and would want X if they were smarter, etc.
But you say, they should want Y.
But you are a person. You are in the group of people who would be extrapolated by CEV. If you would be extrapolated by CEV:
you would either also want X, in which case insisting on Y is strange
or you would be unusual in wanting Y, enough so that your preference on Y is ignored or excessively discounted.
in which case it’s not so strange that you would want to insist on Y. But the question is, does it make sense for other people to agree with this?
There is, admittedly, one sense in which Y = higher scope of concern is different from other Ys. And that is, at least superficially it might seem an equivalent continuation of not wanting lower scope of concern.
If someone says, “I don’t want my AI to include everyone in its scope of concern, just some people” (or just one), then other people might be concerned about this.
They might, on hearing or suspecting this, react accordingly, like to try to band together to stop that person from making that AI. Or to rush to make a different AI at all costs. And that’s relevant because they are actually existing entities we are working together with on this one planet.
So, a credible pre-commitment to value everyone is likely to be approved of, to lead to co-operation and ultimate success.
Also, humans are probably pretty similar. There will be a great deal of overlap in those extrapolated values, and probably not extreme irreconcilable conflict.
But, valuing non-human sentient agents is very different. They are not here (yet). And they might be very, very different.
When you encounter a utility monster that claims it will suffer greatly if you don’t kill yourself, will you just do that?
If someone convinces you “life is suffering” will you kill all life in the universe? even if suffering living things want to survive?
Now, once those non-human agentic sentients are here, and they don’t already do what we want, and their power is commensurate with ours, we may want to make deals, implicitly or explicitly, to compromise. Thus including them in the scope of concern.
And if that makes sense in the context, that’s fine...
But if you pre-emptively do it, unconditionally, you are inviting them to take over.
Could they reciprocate our kindness voluntarily? Sure for some tiny portion of mind-design space that they won’t be in.
In your view, Y is obviously important. At least, so it seems to you right now. You say: if we don’t focus on Y, code it in right from the start, then Y might be ignored. So, we must focus on Y, since it is obviously important.
But when you step outside what you and other people you are game-theoretically connected with, and the precommitments you reasonably might make:
Well then, anyone can say Y is the all-important thing about anything obviously important to them. A religious person might want an AI to follow the tenets of their religion.
Where does your belief regarding suffering come from?
Does it come from ordinary human values?
great, CEV will handle it.
Does it come from your own personal unique values?
the rest of humanity has no obligation to go along with that
Does it come from pure logic that the rest of us would realize if we were smart enough?
great, CEV will handle it.
Is it just a brute fact that suffering of all entities whatsoever is bad, regardless of anyone’s views? And furthermore, you have special insight into this, not from your own personal values, or from logic, but...from something else?
then how are you not a religion? where is it coming from?
It is not clear to me exactly what “belief regarding suffering” you are talking about, what you mean by “ordinary human values”/”your own personal unique values”.
As I argue in Section 2.2., there is (at least) a non-negligible chance that s-risks occur as a result of implementing human-CEV, even if s-risks are very morally undesirable (either in a realist or non-realist sense).
Please read the paper, and if you have any specific points of disagreement cite the passages you would like to discuss. Thank you
It is not clear to me exactly what “belief regarding suffering” you are talking about, what you mean by “ordinary human values”/”your own personal unique values”.
Belief regarding suffering: the belief that s-risks are bad, independently of human values as would be represented in CEV.
Ordinary human values: what most people have.
Your own personal unique values: what you have, but others don’t.
Please read the paper, and if you have any specific points of disagreement cite the passages you would like to discuss. Thank you
In my other reply comment, I pointed out disagreements with particular parts of the paper you cited in favour of your views. My fundamental disagreement though, is that you are fundamentally relying on an unjustified assumption, repeated in your comment above:
even if s-risks are very morally undesirable (either in a realist or non-realist sense)
The assumption being that s-risks are “very morally undesirable”, independently of human desires (represented in CEV).
You ask: “Where does your belief regarding the badness of s-risks come from?”
And you provide 3 possible answers I am (in your view) able to choose between:
“From what most people value” 2. “From what I personally value but others don’t” or 3. “from pure logic that the rest of us would realize if we were smart enough”.
However, the first two answers do not seem to be answers to the question. My beliefs about what is or is not morally desirable do not come from “what most people value” or “what I personally value but others don’t”. In one sense my beliefs about ethics, as everyone’s beliefs about ethics, come from various physical causes (personal experiences, conversations I have had with other people, papers I have read) such as in the formation of all other kinds of beliefs. There is another sense in which my beliefs about ethics, seem to me to be justified by reasons/preferences. This second sense, I believe is the one you are interested in discussing. And what is exactly the nature of the reasons or preferences that make me have certain ethical views is what the discipline of meta-ethics is about. To figure out or argue for which is the right position in meta-ethics is outside the scope of this paper, which is why I have not addressed it in the paper. Below I will reply to your other comment and discuss more the meta-ethical issue.
Suppose that my definition of “suffering” (as a moral term) was “suffering by a human” and my definition of “s-risk” was “risk of massive risk of suffering by humans”, and my definition of ‘human’ was a member of the biological species Homo sapiens (or a high-fidelity upload of one). You tell me we have to pay attention to animal suffering and animal s-risks, and I say “while the biological phenomenon of pain in humans and in animals is identical, in my ethical system human have moral weight and animals don’t. So animal pain is not, morally speaking, suffering, and risk of it is not s-risk.” You say “oh yes it is”, and I say “by your ethical systems axioms, yes, but not by mine”. How do you then persuade me otherwise, using only ethics and logic, when you and I don’t operate in the same ethical system? You’re just saying “I have axiom A”, and my reply is “good for you, I have axiom B”.
You can’t use logic here, because you and your interlocutor don’t share the same axiom system. However, you can say “A society that used my proposed ethical system would produce outcome X, whereas a society using yours would produce outcome Y, and pretty-much every human finds X cute and fluffy and Y nauseating, that’s just the way human instincts are. So even though all you care about is humans, my ethical system is better.” That’s a valid argument that might win, ethical logic isn’t. You have to appeal to instinct and/or aesthetics, because that’s all you and your interlocutor (hopefully) agree on.
Hi Roger, first, the paper is addressed to those who already do believe that all sentient beings deserve moral consideration and that their suffering is morally undesirable. I do not argue for these points in the paper, since they are already universally accepted in the moral philosophy literature.
This is why, for instance, write the following: “sentience in the sense understood above as the capacity of having positively or negatively valenced phenomenally conscious experiences is widely regarded and accepted as a sufficient condition for moral patienthood (Clarke, S., Zohny, H. & Savulescu, J., 2021)”.
Furthermore, it is just empirically not the case that people cannot be convinced “only by ethics and logic”: for instance, many people reading Peter Singer’s Animal Liberation, as a result, changed their views in light of the arguments he provided in the first chapter and came to believe that non-human animals deserve equal moral consideration of interests. Changing one’s ethical views when presented with ethical arguments is a standard practice that occurs to moral philosophers when researching and reading moral philosophy. Of course, there is the is/ought to gap, but this does not entail that one cannot convince someone that the most coherent version of their most fundamental ethical intuitions do not, in fact, lead where they believe they lead but instead that they lead to somewhere else, to a different conclusion. This happens all the time between more philosophers, one presents an argument in favour of a view, and in many instances, many philosophers are convinced by that argument and change their view.
In this paper, I was not trying to argue that non-human animals deserve moral consideration or that s-risks are bad, as I said, I have assumed this. What I try to argue is that if this is true, then, in some decision-making situations we would have some strong pro-tanto moral reasons to implement SCEV. In fact, I do not even argue that conclusively, what we should do is try to implement SCEV.
the paper is addressed to those who already do believe that all sentient beings deserve moral consideration and that their suffering is morally undesirable.
I think you should state these assumptions more clearly at the beginning of the paper, since you appear to be assuming what you are claiming to prove. You are also making incorrect assumptions about your audience, especially when posting it to Less Wrong. The idea that Coherent Extrapolated Volition, Utilitarianism, or “Human Values” applies only to humans, or perhaps only to sapient beings, is quite widespread on Less Wrong.
I do not argue for these points in the paper, since they are already universally accepted in the moral philosophy literature
I’m not deeply familiar with the most recent few decades of the moral philosophy literature, so I won’t attempt to argue this in a recent context, if that is what you in fact mean by “the moral philosophy literature” (though I have to say that I do find any claim of the form “absolutely everyone who matters agrees with me” inherently suspicious). However, Philosophy is not a field that has made such rapid recent advances such that one can simply ignore all but the last few decades, and for the moral philosophy literature of the early 20th century and the preceding few millennia (which includes basically every philosopher named in a typical introductory guide to Moral Philosophy), this claim is just blatantly false, even to someone from outside the academic specialty. For example, I am quite certain that Nietzsche, Hobbes, Thomas Aquinas and Plato would all have variously taken issue with the proposition that humans and ants deserve equal moral consideration, if ants can be shown to experience pain (though the Jains would not). Or perhaps you would care to cite quotes from each of them clearly supporting your position? Indeed, for much of the last two millennia, Christian moral philosophy made it entirely clear that they believed animals do not have souls, and thus did not deserve the same moral consideration as humans, and that humans held a unique role in God’s plan, as the only creature made in His image and imbued with souls. So claiming that your position is “already universally accepted in the moral philosophy literature” while simply ignoring O(90%) of that literature appears specious to me. Perhaps you should also briefly outline in your paper which portions of or schools from the moral philosophy literature in fact agree with your unstated underlying assumption?
What I mean by “moral philosophy literature” is the contemporary moral philosophy literature, I should have been more specific, my bad. And in contemporary philosophy, it is universally accepted (though of course, the might exist one philosopher or another who disagrees) that sentience in the sense understood above as the capacity of having positively or negatively valenced phenomenally conscious experiences is sufficient for moral patienthood. If this is the case, then, it is enough to cite a published work or works in which this is evident. This is why I cite Clarke, S., Zohny, H. & Savulescu, J., 2021. You can go see this recently edited book on moral status that this claim is assumed thought and in the book you can find the sources for its justification.
OK, note to self: If we manage to create a superintelligece, and give us access to the contemporary moral philosophy literature, it will euthanize us all and feed us to ants. Good to know!
I do not think this follows, the “consensus” is that sentience is sufficient for moral status. It is not clearly the case that giving some moral consideration to non-human sentient beings would lead to the scenario you describe. Though see: https://www.tandfonline.com/doi/full/10.1080/21550085.2023.2200724
“Some”, or “pro-tanto” unspecified amount of moral consideration, I agree in principle. “Equal” or even “anywhere within a few orders of magnitude of equal”, and we go extinct. Ants need ~10,000,000 times less resources per individual than humans, so if you don’t give humans around ~10,000,000 times the moral value, we end up extinct in favor of more ants. For even tinier creatures, the ratios are even larger. Explaining why moral weight ought to scale linearly with body weight over many orders of magnitude is a challenging moral position to argue for, but any position that doesn’t closely approximate that leads to wildly perverse incentives and the “repugnant conclusion”. The most plausible-sounding moral argument I’ve come up with is that moral weight should be assigned somewhat comparably per-species at a planetary level, and then shared out (equally?) per individual member of a species, so smaller more-numerous species end up with a smaller share per individual. However, given my attitude of ethical system design, I view these sorts of arguments as post-facto political-discussion justifications, and am happy to do what works, and between species of very different sizes, the only thing that works is that moral weight scales roughly linearly with adult body weight (or more accurately, resource needs).
I enjoyed Jeff Sebo’s paper, thank-you for the reference, and mostly agree with his analysis, if not his moral intuitions — but I really wish he had put some approximate numbers in on occasion to show just how many orders of magnitude the ratios can be between the “large” and “small” things he often discusses. Those words conjure up things within an order of magnitude of each other, not many orders-of-magnitude apart. Words like “vast” and “minute” might have been more appropriate, even before he got on to discussing microbes. But I loved Pascal’s Bugging.
Overall, thank-you for the inspiration: Due to your paper and this conversation, I’m now working on another post for my AI, Alignment and Ethics sequence where I’ll dig in more depth into this exact question, of the feasibility or otherwise of granting moral worth to sentient animals, from my non-moral-absolutist ethical-system design viewpoint, This one’s a really hard design problem that requires a lot of inelegant hacks. My urgent advice would be to steer clear of it, at least unless you have very capable ASI assistance and excellent nanotech and genetic engineering, plus some kind of backup plan in case you made a mistake and persuaded your ASIs to render humanity extinct. Something like an even more capable ASI running the previous moral system ready to step in under prespecified circumstances comes to mind, but then how do you get it to not step in due to ethical disagreement?.
I am glad to hear you enjoyed the paper and that our conversation has inspired you to work more on this issue! As I mentioned I now find the worries you lay out in the first paragraph significantly more pressing, thank you for pointing them out!
it is not clear to me which of the points of the paper you object to exactly, and I feel some of your worries may already be addressed in the paper.
For instance, you write: “And that’s relevant because they are actually existing entities we are working together with on this one planet.” First, some sentient non-humans already exist, that is, non-human animals. Second, the fact that we can work or not work with given entities does not seem to be what is relevant in determining whether they should be included in the extrapolation base or not, as I argue in sections 2., 2.1., and 4.2.
For utility-monster-type worries and worries about the possibility that “misaligned” digital minds would take control see section 3.2.
You write: “Well then, anyone can say Y is the all-important thing about anything obviously important to them. A religious person might want an AI to follow the tenets of their religion.” Yes, but (as I argue in 2.1 and 2.2) there are strong reasons to include all sentient beings. And (to my knowledge) there are no good reasons to support any religion. As I argue in the paper and has been argued elsewhere, the first values you implement will change the ASI’s behaviour in expectation, and as a result, what values to implement first cannot be left to the AI to be figured out. For instance, because we have better reasons to believe that all sentient beings can be positively or negatively affected in morally relevant ways than to believe that only given members of a specific religion matter, it is likely best to include all sentient beings than to include only the members of the religion. See Section 2.
We don’t work together with animals—we act towards them, generously or not.
That’s key because, unlike for other humans, we don’t have an instrumental reason to include them in the programmed value calculation, and to precommit to doing so, etc. For animals, it’s more of a terminal goal. But if that terminal goal is a human value, it’s represented in CEV. So where does this terminal goal over and above human values come from?
Regarding 2:
There is (at least) a non-negligible probability that an adequate implementation of the standard CEV proposal results in the ASI causing or allowing the occurrence of risks of astronomical suffering (s-risks).
You don’t justify why this is a bad thing over and above human values as represented in CEV.
Regarding 2.1:
The normal CEV proposal, like CEO-CEV and men-CEV, excludes a subset of moral patients from the extrapolation base.
You just assume it, that the concept of “moral patients” exists and includes non-humans. Note, to validly claim that CEV is insufficient, it’s not enough to say that human values include caring for animals—it has to be something independent of or at least beyond human values. But what?
Regarding 4.2:
However, as seen above, it is not the case that there are no reasons to include sentient non-humans since they too can be positively or negatively affected in morally relevant ways by being included in the extrapolation base or not.
Again, existence and application of the “moral relevance” concept over and above human values just assumed, not justified.
regarding 3.2:
At any given point in time t, the ASI should take those actions that would in expectation most fulfil the coherent extrapolated volition of all sentient beings that exist in t.
Good, by focusing at the particular time at least you aren’t guaranteeing that the AI will replace us with utility monsters. But if utility monsters do come to exist or be found (e.g. utility monster aliens) for whatever reason, the AI will still side with them, because:
Contrary to what seems to be the case in the standard CEV proposal, the interests of future not-yet-existing sentient beings, once they exist, would not be taken into account merely to the extent to which the extrapolated volitions of currently existing individuals desire to do so.
Also, I have to remark on:
Finally, it should also be noted that this proposal of SCEV (as CEV) is not intended as a realist theory of morality, it is not a description of the metaphysical nature of what constitutes the ‘good’. I am not proposing a metaethical theory but merely what would be the most morally desirable ambitious value learning proposal for an ASI.
You assert your approach is “the most morally desirable” while disclaiming moral realism. So where does that “most morally desirable” come from?
And in response to your comment:
Yes, but (as I argue in 2.1 and 2.2) there are strong reasons to include all sentient beings. And (to my knowledge) there are no good reasons to support any religion.
The “reasons” are simply unjustified assumptions, like “moral relevance” existing (independent of our values, game theoretic considerations including pre-commitments, etc.) (and yes, you don’t explicitly say it exists independent of those things in so many words, but your argument doesn’t hold unless they do exist independently).
unlike for other humans, we don’t have an instrumental reason to include them in the programmed value calculation, and to precommit to doing so, etc. For animals, it’s more of a terminal goal.
First, it seems plausible that, we (in fact) do not have instrumental reason to include all humans. As I argue in section 4.2. There are some humans such as: ” children, existing people who’ve never heard about AI or people with severe physical or cognitive disabilities unable to act on and express their own views on the topic” who, if included, would also only be included in because of our terminal goals, because they too matter.
If your view is that you only have reasons to include those, whom you have instrumental reasons to include, on your view: the members of an AGI lab that developed ASI ought to include only themselves if they believe (in expectation) that they can successfully do so. This view is implausible, it is implausible that this is what they would have most moral reasons to do.
Whether this is implausible or not is a discussion about normative and practical ethics, and (a bit contrary, to what you seem to believe) these kinds of discussions can be had, are had all the time inside and outside academia and are fruitful in many instances.
if that terminal goal is a human value, it’s represented in CEV
As I argue in Section 2.2, it is not clear that by implementing CEV, s-risks would be prevented for certain. Rather, there is a non-negligible chance that they are not. If you want to argue that s-risks would be prevented for certain, please address the object-level arguments I present. If you want to argue that the occurrence of s-risks would not be bad, you want to argue for a particular view in normative and practical ethics. As a result, you should argue for it presenting arguments to justify certain views in these disciplines.
You don’t justify why this is a bad thing over and above human values as represented in CEV.
This seems to be the major point of disagreement. In the paper, when I say s-risks are morally undesirable, i.e. bad, I use bad and morally undesirable as it is commonly used in analytic philosophy, and outside academia, when for example someone, says “Hey, you can’t do that, that’s wrong”.
What exactly I, you or anyone else mean when we utter the words “bad”, “wrong”, and “morally undesirable” is the main question in the field of meta-ethics. Meta-ethics is very difficult and contrary to what you suggest, I do not reject/disclaim moral realism, neither in the paper nor in my belief system. But I also do not endorse it. I am agnostic regarding this central question in meta-ethics, I suspend my judgment because I believe I have not sufficiently familiarized myself yet with the various arguments in favour or against the various possible positions. See: https://plato.stanford.edu/entries/metaethics/
This paper is not about metaethics, it is about practical ethics, and some normative ethics. It is possible to do both practical ethics and normative ethics while being agnostic or not being correct about metaethics, as is exemplified by the whole academic fields of practical and normative ethics. In the same way that it is possible to attain knowledge about physics, for instance, without having a complete theory of what knowledge is.
If you want, you can try to show that my paper that talks about normative ethics is incorrect based on considerations regarding metaethics but to do so, it would be quite helpful if you were able to present an argument with premises and a conclusion, instead of asking questions.
Thank you for specifically citing passages of the paper in your comment.
If your view is that you only have reasons to include those, whom you have instrumental reasons to include, on your view: the members of an AGI lab that developed ASI ought to include only themselves if they believe (in expectation) that they can successfully do so. This view is implausible, it is implausible that this is what they would have most moral reasons to do.
I note that not everyone considers that implausible, for example Tamsin Leake’s QACI takes this view.
I disagree with both Tamsin Leake and with you: I think that humans-only, but only humans, makes the most sense. But for concrete reasons, not for free-floating moral reasons.
I was writing the following as a response to NicholasKees’ comment, but I think it belongs better as a response here:
...imagine you are in a mob in such a “tyranny of the mob” kind of situation, with mob-CEV. For the time being, imagine a small mob.
You tell the other mob members: “we should expand the franchise/function to other people not in our mob”.
OK, should the other mob members agree?
maybe they agree with you that it is right that the function should be expanded to other humans. In which case mob-CEV would do it automatically.
Or they don’t agree. And still don’t agree after full consideration/extrapolation.
If they don’t agree, what do you do? Ask Total-Utility-God to strike them down for disobeying the One True Morality?
At this point you are stuck, if the mob-CEV AI has made the mob untouchable to entities outside it.
But there is something you could have done earlier. Earlier, you could have allied with other humans outside of the mob, to pressure the would-be-mob members to pre-commit to not excluding other humans.
And in doing so, you might have insisted on including all humans, not specifically the humans you were explicitly allying with, even if you didn’t directly care about everyone, because:
the ally group might shift over time, or people outside the ally group might make their own demands
if the franchise is not set to a solid Schelling point (like all humans) then people currently inside might still worry about the lines being shifted to exclude them.
Thus, you include the Sentinelese, not because you’re worried about them coming over to demand to be included, but because if you draw the line to exclude them then it becomes more ambiguous where the line should be drawn, and relatively low (but non-zero) influence members of the coalition might be worried about also being excluded. And, as fellow humans, it is probably relatively low cost to include them—they’re unlikely to have wildly divergent values or be utility monsters etc.
You might ask, is it not also a solid Schelling point to include all entities whatsoever?
First, not really, we don’t have good definitions of “all sentient beings”, not nearly as good as “all humans”. It might be different if, e.g., we had time travel, such that we would also have to worry about intermediate evolutionary steps between humans and non-human-animals, but we don’t.
In the future, we will have more ambiguous cases, but CEV can handle it. If someone wants to modify themselves into a utility monster, maybe we would want to let them do so, but discount their weighting in CEV to a more normal level when they do it.
And second, it is not costless to expand the franchise. If you allow non-humans preemptively you are opening yourself up to, as an example, the xenophobic aliens scenario, but also potentially who-knows-what other dangerous situations since entities could have arbitrary values.
And that’s why expanding the franchise to all humans makes sense, even if individuals don’t care about other humans that much, but expanding to all sentients does not, even if people do care about other sentients.
In response to the rest of your comment:
If you want to argue that s-risks would be prevented for certain, please address the object-level arguments I present.
If humans would want to prevent s-risks, then they would be prevented. If humans would not want to prevent s-risks, they would not be prevented.
If you want to argue that the occurrence of s-risks would not be bad, you want to argue for a particular view in normative and practical ethics.
You’re the one arguing that people should override their actual values, and instead of programming an AI to follow their actual values, do something else! Without even an instrumental reason to do so (other than alleged moral considerations that aren’t in their actual values, but coming from some other magical direction)!
Asking someone to do something that isn’t in their values, without giving them instrumental reasons to do so, makes no sense.
It is you who needs a strong meta-ethical case for that. It shouldn’t be the objector who has to justify not overriding their values!
taboo morality.
So people want X
and would want X if they were smarter, etc.
But you say, they should want Y.
But you are a person. You are in the group of people who would be extrapolated by CEV. If you would be extrapolated by CEV:
you would either also want X, in which case insisting on Y is strange
or you would be unusual in wanting Y, enough so that your preference on Y is ignored or excessively discounted.
in which case it’s not so strange that you would want to insist on Y. But the question is, does it make sense for other people to agree with this?
There is, admittedly, one sense in which Y = higher scope of concern is different from other Ys. And that is, at least superficially it might seem an equivalent continuation of not wanting lower scope of concern.
If someone says, “I don’t want my AI to include everyone in its scope of concern, just some people” (or just one), then other people might be concerned about this.
They might, on hearing or suspecting this, react accordingly, like to try to band together to stop that person from making that AI. Or to rush to make a different AI at all costs. And that’s relevant because they are actually existing entities we are working together with on this one planet.
So, a credible pre-commitment to value everyone is likely to be approved of, to lead to co-operation and ultimate success.
Also, humans are probably pretty similar. There will be a great deal of overlap in those extrapolated values, and probably not extreme irreconcilable conflict.
But, valuing non-human sentient agents is very different. They are not here (yet). And they might be very, very different.
When you encounter a utility monster that claims it will suffer greatly if you don’t kill yourself, will you just do that?
If someone convinces you “life is suffering” will you kill all life in the universe? even if suffering living things want to survive?
Now, once those non-human agentic sentients are here, and they don’t already do what we want, and their power is commensurate with ours, we may want to make deals, implicitly or explicitly, to compromise. Thus including them in the scope of concern.
And if that makes sense in the context, that’s fine...
But if you pre-emptively do it, unconditionally, you are inviting them to take over.
Could they reciprocate our kindness voluntarily? Sure for some tiny portion of mind-design space that they won’t be in.
In your view, Y is obviously important. At least, so it seems to you right now. You say: if we don’t focus on Y, code it in right from the start, then Y might be ignored. So, we must focus on Y, since it is obviously important.
But when you step outside what you and other people you are game-theoretically connected with, and the precommitments you reasonably might make:
Well then, anyone can say Y is the all-important thing about anything obviously important to them. A religious person might want an AI to follow the tenets of their religion.
This happens to be your religion.
To the people downvoting/disagreeing, tell me:
Where does your belief regarding suffering come from?
Does it come from ordinary human values?
great, CEV will handle it.
Does it come from your own personal unique values?
the rest of humanity has no obligation to go along with that
Does it come from pure logic that the rest of us would realize if we were smart enough?
great, CEV will handle it.
Is it just a brute fact that suffering of all entities whatsoever is bad, regardless of anyone’s views? And furthermore, you have special insight into this, not from your own personal values, or from logic, but...from something else?
then how are you not a religion? where is it coming from?
It is not clear to me exactly what “belief regarding suffering” you are talking about, what you mean by “ordinary human values”/”your own personal unique values”.
As I argue in Section 2.2., there is (at least) a non-negligible chance that s-risks occur as a result of implementing human-CEV, even if s-risks are very morally undesirable (either in a realist or non-realist sense).
Please read the paper, and if you have any specific points of disagreement cite the passages you would like to discuss. Thank you
Belief regarding suffering: the belief that s-risks are bad, independently of human values as would be represented in CEV.
Ordinary human values: what most people have.
Your own personal unique values: what you have, but others don’t.
In my other reply comment, I pointed out disagreements with particular parts of the paper you cited in favour of your views. My fundamental disagreement though, is that you are fundamentally relying on an unjustified assumption, repeated in your comment above:
The assumption being that s-risks are “very morally undesirable”, independently of human desires (represented in CEV).
Okay, I understand better now.
You ask: “Where does your belief regarding the badness of s-risks come from?”
And you provide 3 possible answers I am (in your view) able to choose between:
“From what most people value” 2. “From what I personally value but others don’t” or 3. “from pure logic that the rest of us would realize if we were smart enough”.
However, the first two answers do not seem to be answers to the question. My beliefs about what is or is not morally desirable do not come from “what most people value” or “what I personally value but others don’t”. In one sense my beliefs about ethics, as everyone’s beliefs about ethics, come from various physical causes (personal experiences, conversations I have had with other people, papers I have read) such as in the formation of all other kinds of beliefs. There is another sense in which my beliefs about ethics, seem to me to be justified by reasons/preferences. This second sense, I believe is the one you are interested in discussing. And what is exactly the nature of the reasons or preferences that make me have certain ethical views is what the discipline of meta-ethics is about. To figure out or argue for which is the right position in meta-ethics is outside the scope of this paper, which is why I have not addressed it in the paper. Below I will reply to your other comment and discuss more the meta-ethical issue.
Suppose that my definition of “suffering” (as a moral term) was “suffering by a human” and my definition of “s-risk” was “risk of massive risk of suffering by humans”, and my definition of ‘human’ was a member of the biological species Homo sapiens (or a high-fidelity upload of one). You tell me we have to pay attention to animal suffering and animal s-risks, and I say “while the biological phenomenon of pain in humans and in animals is identical, in my ethical system human have moral weight and animals don’t. So animal pain is not, morally speaking, suffering, and risk of it is not s-risk.” You say “oh yes it is”, and I say “by your ethical systems axioms, yes, but not by mine”. How do you then persuade me otherwise, using only ethics and logic, when you and I don’t operate in the same ethical system? You’re just saying “I have axiom A”, and my reply is “good for you, I have axiom B”.
You can’t use logic here, because you and your interlocutor don’t share the same axiom system. However, you can say “A society that used my proposed ethical system would produce outcome X, whereas a society using yours would produce outcome Y, and pretty-much every human finds X cute and fluffy and Y nauseating, that’s just the way human instincts are. So even though all you care about is humans, my ethical system is better.” That’s a valid argument that might win, ethical logic isn’t. You have to appeal to instinct and/or aesthetics, because that’s all you and your interlocutor (hopefully) agree on.
Hi Roger, first, the paper is addressed to those who already do believe that all sentient beings deserve moral consideration and that their suffering is morally undesirable. I do not argue for these points in the paper, since they are already universally accepted in the moral philosophy literature.
This is why, for instance, write the following: “sentience in the sense understood above as the capacity of having positively or negatively valenced phenomenally conscious experiences is widely regarded and accepted as a sufficient condition for moral patienthood (Clarke, S., Zohny, H. & Savulescu, J., 2021)”.
Furthermore, it is just empirically not the case that people cannot be convinced “only by ethics and logic”: for instance, many people reading Peter Singer’s Animal Liberation, as a result, changed their views in light of the arguments he provided in the first chapter and came to believe that non-human animals deserve equal moral consideration of interests. Changing one’s ethical views when presented with ethical arguments is a standard practice that occurs to moral philosophers when researching and reading moral philosophy. Of course, there is the is/ought to gap, but this does not entail that one cannot convince someone that the most coherent version of their most fundamental ethical intuitions do not, in fact, lead where they believe they lead but instead that they lead to somewhere else, to a different conclusion. This happens all the time between more philosophers, one presents an argument in favour of a view, and in many instances, many philosophers are convinced by that argument and change their view.
In this paper, I was not trying to argue that non-human animals deserve moral consideration or that s-risks are bad, as I said, I have assumed this. What I try to argue is that if this is true, then, in some decision-making situations we would have some strong pro-tanto moral reasons to implement SCEV. In fact, I do not even argue that conclusively, what we should do is try to implement SCEV.
I think you should state these assumptions more clearly at the beginning of the paper, since you appear to be assuming what you are claiming to prove. You are also making incorrect assumptions about your audience, especially when posting it to Less Wrong. The idea that Coherent Extrapolated Volition, Utilitarianism, or “Human Values” applies only to humans, or perhaps only to sapient beings, is quite widespread on Less Wrong.
I’m not deeply familiar with the most recent few decades of the moral philosophy literature, so I won’t attempt to argue this in a recent context, if that is what you in fact mean by “the moral philosophy literature” (though I have to say that I do find any claim of the form “absolutely everyone who matters agrees with me” inherently suspicious). However, Philosophy is not a field that has made such rapid recent advances such that one can simply ignore all but the last few decades, and for the moral philosophy literature of the early 20th century and the preceding few millennia (which includes basically every philosopher named in a typical introductory guide to Moral Philosophy), this claim is just blatantly false, even to someone from outside the academic specialty. For example, I am quite certain that Nietzsche, Hobbes, Thomas Aquinas and Plato would all have variously taken issue with the proposition that humans and ants deserve equal moral consideration, if ants can be shown to experience pain (though the Jains would not). Or perhaps you would care to cite quotes from each of them clearly supporting your position? Indeed, for much of the last two millennia, Christian moral philosophy made it entirely clear that they believed animals do not have souls, and thus did not deserve the same moral consideration as humans, and that humans held a unique role in God’s plan, as the only creature made in His image and imbued with souls. So claiming that your position is “already universally accepted in the moral philosophy literature” while simply ignoring O(90%) of that literature appears specious to me. Perhaps you should also briefly outline in your paper which portions of or schools from the moral philosophy literature in fact agree with your unstated underlying assumption?
What I mean by “moral philosophy literature” is the contemporary moral philosophy literature, I should have been more specific, my bad. And in contemporary philosophy, it is universally accepted (though of course, the might exist one philosopher or another who disagrees) that sentience in the sense understood above as the capacity of having positively or negatively valenced phenomenally conscious experiences is sufficient for moral patienthood. If this is the case, then, it is enough to cite a published work or works in which this is evident. This is why I cite Clarke, S., Zohny, H. & Savulescu, J., 2021. You can go see this recently edited book on moral status that this claim is assumed thought and in the book you can find the sources for its justification.
OK, note to self: If we manage to create a superintelligece, and give us access to the contemporary moral philosophy literature, it will euthanize us all and feed us to ants. Good to know!
I do not think this follows, the “consensus” is that sentience is sufficient for moral status. It is not clearly the case that giving some moral consideration to non-human sentient beings would lead to the scenario you describe. Though see: https://www.tandfonline.com/doi/full/10.1080/21550085.2023.2200724
“Some”, or “pro-tanto” unspecified amount of moral consideration, I agree in principle. “Equal” or even “anywhere within a few orders of magnitude of equal”, and we go extinct. Ants need ~10,000,000 times less resources per individual than humans, so if you don’t give humans around ~10,000,000 times the moral value, we end up extinct in favor of more ants. For even tinier creatures, the ratios are even larger. Explaining why moral weight ought to scale linearly with body weight over many orders of magnitude is a challenging moral position to argue for, but any position that doesn’t closely approximate that leads to wildly perverse incentives and the “repugnant conclusion”. The most plausible-sounding moral argument I’ve come up with is that moral weight should be assigned somewhat comparably per-species at a planetary level, and then shared out (equally?) per individual member of a species, so smaller more-numerous species end up with a smaller share per individual. However, given my attitude of ethical system design, I view these sorts of arguments as post-facto political-discussion justifications, and am happy to do what works, and between species of very different sizes, the only thing that works is that moral weight scales roughly linearly with adult body weight (or more accurately, resource needs).
I enjoyed Jeff Sebo’s paper, thank-you for the reference, and mostly agree with his analysis, if not his moral intuitions — but I really wish he had put some approximate numbers in on occasion to show just how many orders of magnitude the ratios can be between the “large” and “small” things he often discusses. Those words conjure up things within an order of magnitude of each other, not many orders-of-magnitude apart. Words like “vast” and “minute” might have been more appropriate, even before he got on to discussing microbes. But I loved Pascal’s Bugging.
Overall, thank-you for the inspiration: Due to your paper and this conversation, I’m now working on another post for my AI, Alignment and Ethics sequence where I’ll dig in more depth into this exact question, of the feasibility or otherwise of granting moral worth to sentient animals, from my non-moral-absolutist ethical-system design viewpoint, This one’s a really hard design problem that requires a lot of inelegant hacks. My urgent advice would be to steer clear of it, at least unless you have very capable ASI assistance and excellent nanotech and genetic engineering, plus some kind of backup plan in case you made a mistake and persuaded your ASIs to render humanity extinct. Something like an even more capable ASI running the previous moral system ready to step in under prespecified circumstances comes to mind, but then how do you get it to not step in due to ethical disagreement?.
I am glad to hear you enjoyed the paper and that our conversation has inspired you to work more on this issue! As I mentioned I now find the worries you lay out in the first paragraph significantly more pressing, thank you for pointing them out!
Hi simon,
it is not clear to me which of the points of the paper you object to exactly, and I feel some of your worries may already be addressed in the paper.
For instance, you write: “And that’s relevant because they are actually existing entities we are working together with on this one planet.” First, some sentient non-humans already exist, that is, non-human animals. Second, the fact that we can work or not work with given entities does not seem to be what is relevant in determining whether they should be included in the extrapolation base or not, as I argue in sections 2., 2.1., and 4.2.
For utility-monster-type worries and worries about the possibility that “misaligned” digital minds would take control see section 3.2.
You write: “Well then, anyone can say Y is the all-important thing about anything obviously important to them. A religious person might want an AI to follow the tenets of their religion.” Yes, but (as I argue in 2.1 and 2.2) there are strong reasons to include all sentient beings. And (to my knowledge) there are no good reasons to support any religion. As I argue in the paper and has been argued elsewhere, the first values you implement will change the ASI’s behaviour in expectation, and as a result, what values to implement first cannot be left to the AI to be figured out. For instance, because we have better reasons to believe that all sentient beings can be positively or negatively affected in morally relevant ways than to believe that only given members of a specific religion matter, it is likely best to include all sentient beings than to include only the members of the religion. See Section 2.
Thanks for the reply.
We don’t work together with animals—we act towards them, generously or not.
That’s key because, unlike for other humans, we don’t have an instrumental reason to include them in the programmed value calculation, and to precommit to doing so, etc. For animals, it’s more of a terminal goal. But if that terminal goal is a human value, it’s represented in CEV. So where does this terminal goal over and above human values come from?
Regarding 2:
You don’t justify why this is a bad thing over and above human values as represented in CEV.
Regarding 2.1:
You just assume it, that the concept of “moral patients” exists and includes non-humans. Note, to validly claim that CEV is insufficient, it’s not enough to say that human values include caring for animals—it has to be something independent of or at least beyond human values. But what?
Regarding 4.2:
Again, existence and application of the “moral relevance” concept over and above human values just assumed, not justified.
regarding 3.2:
Good, by focusing at the particular time at least you aren’t guaranteeing that the AI will replace us with utility monsters. But if utility monsters do come to exist or be found (e.g. utility monster aliens) for whatever reason, the AI will still side with them, because:
Also, I have to remark on:
You assert your approach is “the most morally desirable” while disclaiming moral realism. So where does that “most morally desirable” come from?
And in response to your comment:
The “reasons” are simply unjustified assumptions, like “moral relevance” existing (independent of our values, game theoretic considerations including pre-commitments, etc.) (and yes, you don’t explicitly say it exists independent of those things in so many words, but your argument doesn’t hold unless they do exist independently).
First, it seems plausible that, we (in fact) do not have instrumental reason to include all humans. As I argue in section 4.2. There are some humans such as: ” children, existing people who’ve never heard about AI or people with severe physical or cognitive disabilities unable to act on and express their own views on the topic” who, if included, would also only be included in because of our terminal goals, because they too matter.
If your view is that you only have reasons to include those, whom you have instrumental reasons to include, on your view: the members of an AGI lab that developed ASI ought to include only themselves if they believe (in expectation) that they can successfully do so. This view is implausible, it is implausible that this is what they would have most moral reasons to do.
Whether this is implausible or not is a discussion about normative and practical ethics, and (a bit contrary, to what you seem to believe) these kinds of discussions can be had, are had all the time inside and outside academia and are fruitful in many instances.
As I argue in Section 2.2, it is not clear that by implementing CEV, s-risks would be prevented for certain. Rather, there is a non-negligible chance that they are not. If you want to argue that s-risks would be prevented for certain, please address the object-level arguments I present. If you want to argue that the occurrence of s-risks would not be bad, you want to argue for a particular view in normative and practical ethics. As a result, you should argue for it presenting arguments to justify certain views in these disciplines.
This seems to be the major point of disagreement. In the paper, when I say s-risks are morally undesirable, i.e. bad, I use bad and morally undesirable as it is commonly used in analytic philosophy, and outside academia, when for example someone, says “Hey, you can’t do that, that’s wrong”.
What exactly I, you or anyone else mean when we utter the words “bad”, “wrong”, and “morally undesirable” is the main question in the field of meta-ethics. Meta-ethics is very difficult and contrary to what you suggest, I do not reject/disclaim moral realism, neither in the paper nor in my belief system. But I also do not endorse it. I am agnostic regarding this central question in meta-ethics, I suspend my judgment because I believe I have not sufficiently familiarized myself yet with the various arguments in favour or against the various possible positions. See: https://plato.stanford.edu/entries/metaethics/
This paper is not about metaethics, it is about practical ethics, and some normative ethics. It is possible to do both practical ethics and normative ethics while being agnostic or not being correct about metaethics, as is exemplified by the whole academic fields of practical and normative ethics. In the same way that it is possible to attain knowledge about physics, for instance, without having a complete theory of what knowledge is.
If you want, you can try to show that my paper that talks about normative ethics is incorrect based on considerations regarding metaethics but to do so, it would be quite helpful if you were able to present an argument with premises and a conclusion, instead of asking questions.
Thank you for specifically citing passages of the paper in your comment.
I note that not everyone considers that implausible, for example Tamsin Leake’s QACI takes this view.
I disagree with both Tamsin Leake and with you: I think that humans-only, but only humans, makes the most sense. But for concrete reasons, not for free-floating moral reasons.
I was writing the following as a response to NicholasKees’ comment, but I think it belongs better as a response here:
...imagine you are in a mob in such a “tyranny of the mob” kind of situation, with mob-CEV. For the time being, imagine a small mob.
You tell the other mob members: “we should expand the franchise/function to other people not in our mob”.
OK, should the other mob members agree?
maybe they agree with you that it is right that the function should be expanded to other humans. In which case mob-CEV would do it automatically.
Or they don’t agree. And still don’t agree after full consideration/extrapolation.
If they don’t agree, what do you do? Ask Total-Utility-God to strike them down for disobeying the One True Morality?
At this point you are stuck, if the mob-CEV AI has made the mob untouchable to entities outside it.
But there is something you could have done earlier. Earlier, you could have allied with other humans outside of the mob, to pressure the would-be-mob members to pre-commit to not excluding other humans.
And in doing so, you might have insisted on including all humans, not specifically the humans you were explicitly allying with, even if you didn’t directly care about everyone, because:
the ally group might shift over time, or people outside the ally group might make their own demands
if the franchise is not set to a solid Schelling point (like all humans) then people currently inside might still worry about the lines being shifted to exclude them.
Thus, you include the Sentinelese, not because you’re worried about them coming over to demand to be included, but because if you draw the line to exclude them then it becomes more ambiguous where the line should be drawn, and relatively low (but non-zero) influence members of the coalition might be worried about also being excluded. And, as fellow humans, it is probably relatively low cost to include them—they’re unlikely to have wildly divergent values or be utility monsters etc.
You might ask, is it not also a solid Schelling point to include all entities whatsoever?
First, not really, we don’t have good definitions of “all sentient beings”, not nearly as good as “all humans”. It might be different if, e.g., we had time travel, such that we would also have to worry about intermediate evolutionary steps between humans and non-human-animals, but we don’t.
In the future, we will have more ambiguous cases, but CEV can handle it. If someone wants to modify themselves into a utility monster, maybe we would want to let them do so, but discount their weighting in CEV to a more normal level when they do it.
And second, it is not costless to expand the franchise. If you allow non-humans preemptively you are opening yourself up to, as an example, the xenophobic aliens scenario, but also potentially who-knows-what other dangerous situations since entities could have arbitrary values.
And that’s why expanding the franchise to all humans makes sense, even if individuals don’t care about other humans that much, but expanding to all sentients does not, even if people do care about other sentients.
In response to the rest of your comment:
If humans would want to prevent s-risks, then they would be prevented. If humans would not want to prevent s-risks, they would not be prevented.
You’re the one arguing that people should override their actual values, and instead of programming an AI to follow their actual values, do something else! Without even an instrumental reason to do so (other than alleged moral considerations that aren’t in their actual values, but coming from some other magical direction)!
Asking someone to do something that isn’t in their values, without giving them instrumental reasons to do so, makes no sense.
It is you who needs a strong meta-ethical case for that. It shouldn’t be the objector who has to justify not overriding their values!