it is not clear to me which of the points of the paper you object to exactly, and I feel some of your worries may already be addressed in the paper.
For instance, you write: “And that’s relevant because they are actually existing entities we are working together with on this one planet.” First, some sentient non-humans already exist, that is, non-human animals. Second, the fact that we can work or not work with given entities does not seem to be what is relevant in determining whether they should be included in the extrapolation base or not, as I argue in sections 2., 2.1., and 4.2.
For utility-monster-type worries and worries about the possibility that “misaligned” digital minds would take control see section 3.2.
You write: “Well then, anyone can say Y is the all-important thing about anything obviously important to them. A religious person might want an AI to follow the tenets of their religion.” Yes, but (as I argue in 2.1 and 2.2) there are strong reasons to include all sentient beings. And (to my knowledge) there are no good reasons to support any religion. As I argue in the paper and has been argued elsewhere, the first values you implement will change the ASI’s behaviour in expectation, and as a result, what values to implement first cannot be left to the AI to be figured out. For instance, because we have better reasons to believe that all sentient beings can be positively or negatively affected in morally relevant ways than to believe that only given members of a specific religion matter, it is likely best to include all sentient beings than to include only the members of the religion. See Section 2.
We don’t work together with animals—we act towards them, generously or not.
That’s key because, unlike for other humans, we don’t have an instrumental reason to include them in the programmed value calculation, and to precommit to doing so, etc. For animals, it’s more of a terminal goal. But if that terminal goal is a human value, it’s represented in CEV. So where does this terminal goal over and above human values come from?
Regarding 2:
There is (at least) a non-negligible probability that an adequate implementation of the standard CEV proposal results in the ASI causing or allowing the occurrence of risks of astronomical suffering (s-risks).
You don’t justify why this is a bad thing over and above human values as represented in CEV.
Regarding 2.1:
The normal CEV proposal, like CEO-CEV and men-CEV, excludes a subset of moral patients from the extrapolation base.
You just assume it, that the concept of “moral patients” exists and includes non-humans. Note, to validly claim that CEV is insufficient, it’s not enough to say that human values include caring for animals—it has to be something independent of or at least beyond human values. But what?
Regarding 4.2:
However, as seen above, it is not the case that there are no reasons to include sentient non-humans since they too can be positively or negatively affected in morally relevant ways by being included in the extrapolation base or not.
Again, existence and application of the “moral relevance” concept over and above human values just assumed, not justified.
regarding 3.2:
At any given point in time t, the ASI should take those actions that would in expectation most fulfil the coherent extrapolated volition of all sentient beings that exist in t.
Good, by focusing at the particular time at least you aren’t guaranteeing that the AI will replace us with utility monsters. But if utility monsters do come to exist or be found (e.g. utility monster aliens) for whatever reason, the AI will still side with them, because:
Contrary to what seems to be the case in the standard CEV proposal, the interests of future not-yet-existing sentient beings, once they exist, would not be taken into account merely to the extent to which the extrapolated volitions of currently existing individuals desire to do so.
Also, I have to remark on:
Finally, it should also be noted that this proposal of SCEV (as CEV) is not intended as a realist theory of morality, it is not a description of the metaphysical nature of what constitutes the ‘good’. I am not proposing a metaethical theory but merely what would be the most morally desirable ambitious value learning proposal for an ASI.
You assert your approach is “the most morally desirable” while disclaiming moral realism. So where does that “most morally desirable” come from?
And in response to your comment:
Yes, but (as I argue in 2.1 and 2.2) there are strong reasons to include all sentient beings. And (to my knowledge) there are no good reasons to support any religion.
The “reasons” are simply unjustified assumptions, like “moral relevance” existing (independent of our values, game theoretic considerations including pre-commitments, etc.) (and yes, you don’t explicitly say it exists independent of those things in so many words, but your argument doesn’t hold unless they do exist independently).
unlike for other humans, we don’t have an instrumental reason to include them in the programmed value calculation, and to precommit to doing so, etc. For animals, it’s more of a terminal goal.
First, it seems plausible that, we (in fact) do not have instrumental reason to include all humans. As I argue in section 4.2. There are some humans such as: ” children, existing people who’ve never heard about AI or people with severe physical or cognitive disabilities unable to act on and express their own views on the topic” who, if included, would also only be included in because of our terminal goals, because they too matter.
If your view is that you only have reasons to include those, whom you have instrumental reasons to include, on your view: the members of an AGI lab that developed ASI ought to include only themselves if they believe (in expectation) that they can successfully do so. This view is implausible, it is implausible that this is what they would have most moral reasons to do.
Whether this is implausible or not is a discussion about normative and practical ethics, and (a bit contrary, to what you seem to believe) these kinds of discussions can be had, are had all the time inside and outside academia and are fruitful in many instances.
if that terminal goal is a human value, it’s represented in CEV
As I argue in Section 2.2, it is not clear that by implementing CEV, s-risks would be prevented for certain. Rather, there is a non-negligible chance that they are not. If you want to argue that s-risks would be prevented for certain, please address the object-level arguments I present. If you want to argue that the occurrence of s-risks would not be bad, you want to argue for a particular view in normative and practical ethics. As a result, you should argue for it presenting arguments to justify certain views in these disciplines.
You don’t justify why this is a bad thing over and above human values as represented in CEV.
This seems to be the major point of disagreement. In the paper, when I say s-risks are morally undesirable, i.e. bad, I use bad and morally undesirable as it is commonly used in analytic philosophy, and outside academia, when for example someone, says “Hey, you can’t do that, that’s wrong”.
What exactly I, you or anyone else mean when we utter the words “bad”, “wrong”, and “morally undesirable” is the main question in the field of meta-ethics. Meta-ethics is very difficult and contrary to what you suggest, I do not reject/disclaim moral realism, neither in the paper nor in my belief system. But I also do not endorse it. I am agnostic regarding this central question in meta-ethics, I suspend my judgment because I believe I have not sufficiently familiarized myself yet with the various arguments in favour or against the various possible positions. See: https://plato.stanford.edu/entries/metaethics/
This paper is not about metaethics, it is about practical ethics, and some normative ethics. It is possible to do both practical ethics and normative ethics while being agnostic or not being correct about metaethics, as is exemplified by the whole academic fields of practical and normative ethics. In the same way that it is possible to attain knowledge about physics, for instance, without having a complete theory of what knowledge is.
If you want, you can try to show that my paper that talks about normative ethics is incorrect based on considerations regarding metaethics but to do so, it would be quite helpful if you were able to present an argument with premises and a conclusion, instead of asking questions.
Thank you for specifically citing passages of the paper in your comment.
If your view is that you only have reasons to include those, whom you have instrumental reasons to include, on your view: the members of an AGI lab that developed ASI ought to include only themselves if they believe (in expectation) that they can successfully do so. This view is implausible, it is implausible that this is what they would have most moral reasons to do.
I note that not everyone considers that implausible, for example Tamsin Leake’s QACI takes this view.
I disagree with both Tamsin Leake and with you: I think that humans-only, but only humans, makes the most sense. But for concrete reasons, not for free-floating moral reasons.
I was writing the following as a response to NicholasKees’ comment, but I think it belongs better as a response here:
...imagine you are in a mob in such a “tyranny of the mob” kind of situation, with mob-CEV. For the time being, imagine a small mob.
You tell the other mob members: “we should expand the franchise/function to other people not in our mob”.
OK, should the other mob members agree?
maybe they agree with you that it is right that the function should be expanded to other humans. In which case mob-CEV would do it automatically.
Or they don’t agree. And still don’t agree after full consideration/extrapolation.
If they don’t agree, what do you do? Ask Total-Utility-God to strike them down for disobeying the One True Morality?
At this point you are stuck, if the mob-CEV AI has made the mob untouchable to entities outside it.
But there is something you could have done earlier. Earlier, you could have allied with other humans outside of the mob, to pressure the would-be-mob members to pre-commit to not excluding other humans.
And in doing so, you might have insisted on including all humans, not specifically the humans you were explicitly allying with, even if you didn’t directly care about everyone, because:
the ally group might shift over time, or people outside the ally group might make their own demands
if the franchise is not set to a solid Schelling point (like all humans) then people currently inside might still worry about the lines being shifted to exclude them.
Thus, you include the Sentinelese, not because you’re worried about them coming over to demand to be included, but because if you draw the line to exclude them then it becomes more ambiguous where the line should be drawn, and relatively low (but non-zero) influence members of the coalition might be worried about also being excluded. And, as fellow humans, it is probably relatively low cost to include them—they’re unlikely to have wildly divergent values or be utility monsters etc.
You might ask, is it not also a solid Schelling point to include all entities whatsoever?
First, not really, we don’t have good definitions of “all sentient beings”, not nearly as good as “all humans”. It might be different if, e.g., we had time travel, such that we would also have to worry about intermediate evolutionary steps between humans and non-human-animals, but we don’t.
In the future, we will have more ambiguous cases, but CEV can handle it. If someone wants to modify themselves into a utility monster, maybe we would want to let them do so, but discount their weighting in CEV to a more normal level when they do it.
And second, it is not costless to expand the franchise. If you allow non-humans preemptively you are opening yourself up to, as an example, the xenophobic aliens scenario, but also potentially who-knows-what other dangerous situations since entities could have arbitrary values.
And that’s why expanding the franchise to all humans makes sense, even if individuals don’t care about other humans that much, but expanding to all sentients does not, even if people do care about other sentients.
In response to the rest of your comment:
If you want to argue that s-risks would be prevented for certain, please address the object-level arguments I present.
If humans would want to prevent s-risks, then they would be prevented. If humans would not want to prevent s-risks, they would not be prevented.
If you want to argue that the occurrence of s-risks would not be bad, you want to argue for a particular view in normative and practical ethics.
You’re the one arguing that people should override their actual values, and instead of programming an AI to follow their actual values, do something else! Without even an instrumental reason to do so (other than alleged moral considerations that aren’t in their actual values, but coming from some other magical direction)!
Asking someone to do something that isn’t in their values, without giving them instrumental reasons to do so, makes no sense.
It is you who needs a strong meta-ethical case for that. It shouldn’t be the objector who has to justify not overriding their values!
Hi simon,
it is not clear to me which of the points of the paper you object to exactly, and I feel some of your worries may already be addressed in the paper.
For instance, you write: “And that’s relevant because they are actually existing entities we are working together with on this one planet.” First, some sentient non-humans already exist, that is, non-human animals. Second, the fact that we can work or not work with given entities does not seem to be what is relevant in determining whether they should be included in the extrapolation base or not, as I argue in sections 2., 2.1., and 4.2.
For utility-monster-type worries and worries about the possibility that “misaligned” digital minds would take control see section 3.2.
You write: “Well then, anyone can say Y is the all-important thing about anything obviously important to them. A religious person might want an AI to follow the tenets of their religion.” Yes, but (as I argue in 2.1 and 2.2) there are strong reasons to include all sentient beings. And (to my knowledge) there are no good reasons to support any religion. As I argue in the paper and has been argued elsewhere, the first values you implement will change the ASI’s behaviour in expectation, and as a result, what values to implement first cannot be left to the AI to be figured out. For instance, because we have better reasons to believe that all sentient beings can be positively or negatively affected in morally relevant ways than to believe that only given members of a specific religion matter, it is likely best to include all sentient beings than to include only the members of the religion. See Section 2.
Thanks for the reply.
We don’t work together with animals—we act towards them, generously or not.
That’s key because, unlike for other humans, we don’t have an instrumental reason to include them in the programmed value calculation, and to precommit to doing so, etc. For animals, it’s more of a terminal goal. But if that terminal goal is a human value, it’s represented in CEV. So where does this terminal goal over and above human values come from?
Regarding 2:
You don’t justify why this is a bad thing over and above human values as represented in CEV.
Regarding 2.1:
You just assume it, that the concept of “moral patients” exists and includes non-humans. Note, to validly claim that CEV is insufficient, it’s not enough to say that human values include caring for animals—it has to be something independent of or at least beyond human values. But what?
Regarding 4.2:
Again, existence and application of the “moral relevance” concept over and above human values just assumed, not justified.
regarding 3.2:
Good, by focusing at the particular time at least you aren’t guaranteeing that the AI will replace us with utility monsters. But if utility monsters do come to exist or be found (e.g. utility monster aliens) for whatever reason, the AI will still side with them, because:
Also, I have to remark on:
You assert your approach is “the most morally desirable” while disclaiming moral realism. So where does that “most morally desirable” come from?
And in response to your comment:
The “reasons” are simply unjustified assumptions, like “moral relevance” existing (independent of our values, game theoretic considerations including pre-commitments, etc.) (and yes, you don’t explicitly say it exists independent of those things in so many words, but your argument doesn’t hold unless they do exist independently).
First, it seems plausible that, we (in fact) do not have instrumental reason to include all humans. As I argue in section 4.2. There are some humans such as: ” children, existing people who’ve never heard about AI or people with severe physical or cognitive disabilities unable to act on and express their own views on the topic” who, if included, would also only be included in because of our terminal goals, because they too matter.
If your view is that you only have reasons to include those, whom you have instrumental reasons to include, on your view: the members of an AGI lab that developed ASI ought to include only themselves if they believe (in expectation) that they can successfully do so. This view is implausible, it is implausible that this is what they would have most moral reasons to do.
Whether this is implausible or not is a discussion about normative and practical ethics, and (a bit contrary, to what you seem to believe) these kinds of discussions can be had, are had all the time inside and outside academia and are fruitful in many instances.
As I argue in Section 2.2, it is not clear that by implementing CEV, s-risks would be prevented for certain. Rather, there is a non-negligible chance that they are not. If you want to argue that s-risks would be prevented for certain, please address the object-level arguments I present. If you want to argue that the occurrence of s-risks would not be bad, you want to argue for a particular view in normative and practical ethics. As a result, you should argue for it presenting arguments to justify certain views in these disciplines.
This seems to be the major point of disagreement. In the paper, when I say s-risks are morally undesirable, i.e. bad, I use bad and morally undesirable as it is commonly used in analytic philosophy, and outside academia, when for example someone, says “Hey, you can’t do that, that’s wrong”.
What exactly I, you or anyone else mean when we utter the words “bad”, “wrong”, and “morally undesirable” is the main question in the field of meta-ethics. Meta-ethics is very difficult and contrary to what you suggest, I do not reject/disclaim moral realism, neither in the paper nor in my belief system. But I also do not endorse it. I am agnostic regarding this central question in meta-ethics, I suspend my judgment because I believe I have not sufficiently familiarized myself yet with the various arguments in favour or against the various possible positions. See: https://plato.stanford.edu/entries/metaethics/
This paper is not about metaethics, it is about practical ethics, and some normative ethics. It is possible to do both practical ethics and normative ethics while being agnostic or not being correct about metaethics, as is exemplified by the whole academic fields of practical and normative ethics. In the same way that it is possible to attain knowledge about physics, for instance, without having a complete theory of what knowledge is.
If you want, you can try to show that my paper that talks about normative ethics is incorrect based on considerations regarding metaethics but to do so, it would be quite helpful if you were able to present an argument with premises and a conclusion, instead of asking questions.
Thank you for specifically citing passages of the paper in your comment.
I note that not everyone considers that implausible, for example Tamsin Leake’s QACI takes this view.
I disagree with both Tamsin Leake and with you: I think that humans-only, but only humans, makes the most sense. But for concrete reasons, not for free-floating moral reasons.
I was writing the following as a response to NicholasKees’ comment, but I think it belongs better as a response here:
...imagine you are in a mob in such a “tyranny of the mob” kind of situation, with mob-CEV. For the time being, imagine a small mob.
You tell the other mob members: “we should expand the franchise/function to other people not in our mob”.
OK, should the other mob members agree?
maybe they agree with you that it is right that the function should be expanded to other humans. In which case mob-CEV would do it automatically.
Or they don’t agree. And still don’t agree after full consideration/extrapolation.
If they don’t agree, what do you do? Ask Total-Utility-God to strike them down for disobeying the One True Morality?
At this point you are stuck, if the mob-CEV AI has made the mob untouchable to entities outside it.
But there is something you could have done earlier. Earlier, you could have allied with other humans outside of the mob, to pressure the would-be-mob members to pre-commit to not excluding other humans.
And in doing so, you might have insisted on including all humans, not specifically the humans you were explicitly allying with, even if you didn’t directly care about everyone, because:
the ally group might shift over time, or people outside the ally group might make their own demands
if the franchise is not set to a solid Schelling point (like all humans) then people currently inside might still worry about the lines being shifted to exclude them.
Thus, you include the Sentinelese, not because you’re worried about them coming over to demand to be included, but because if you draw the line to exclude them then it becomes more ambiguous where the line should be drawn, and relatively low (but non-zero) influence members of the coalition might be worried about also being excluded. And, as fellow humans, it is probably relatively low cost to include them—they’re unlikely to have wildly divergent values or be utility monsters etc.
You might ask, is it not also a solid Schelling point to include all entities whatsoever?
First, not really, we don’t have good definitions of “all sentient beings”, not nearly as good as “all humans”. It might be different if, e.g., we had time travel, such that we would also have to worry about intermediate evolutionary steps between humans and non-human-animals, but we don’t.
In the future, we will have more ambiguous cases, but CEV can handle it. If someone wants to modify themselves into a utility monster, maybe we would want to let them do so, but discount their weighting in CEV to a more normal level when they do it.
And second, it is not costless to expand the franchise. If you allow non-humans preemptively you are opening yourself up to, as an example, the xenophobic aliens scenario, but also potentially who-knows-what other dangerous situations since entities could have arbitrary values.
And that’s why expanding the franchise to all humans makes sense, even if individuals don’t care about other humans that much, but expanding to all sentients does not, even if people do care about other sentients.
In response to the rest of your comment:
If humans would want to prevent s-risks, then they would be prevented. If humans would not want to prevent s-risks, they would not be prevented.
You’re the one arguing that people should override their actual values, and instead of programming an AI to follow their actual values, do something else! Without even an instrumental reason to do so (other than alleged moral considerations that aren’t in their actual values, but coming from some other magical direction)!
Asking someone to do something that isn’t in their values, without giving them instrumental reasons to do so, makes no sense.
It is you who needs a strong meta-ethical case for that. It shouldn’t be the objector who has to justify not overriding their values!