unlike for other humans, we don’t have an instrumental reason to include them in the programmed value calculation, and to precommit to doing so, etc. For animals, it’s more of a terminal goal.
First, it seems plausible that, we (in fact) do not have instrumental reason to include all humans. As I argue in section 4.2. There are some humans such as: ” children, existing people who’ve never heard about AI or people with severe physical or cognitive disabilities unable to act on and express their own views on the topic” who, if included, would also only be included in because of our terminal goals, because they too matter.
If your view is that you only have reasons to include those, whom you have instrumental reasons to include, on your view: the members of an AGI lab that developed ASI ought to include only themselves if they believe (in expectation) that they can successfully do so. This view is implausible, it is implausible that this is what they would have most moral reasons to do.
Whether this is implausible or not is a discussion about normative and practical ethics, and (a bit contrary, to what you seem to believe) these kinds of discussions can be had, are had all the time inside and outside academia and are fruitful in many instances.
if that terminal goal is a human value, it’s represented in CEV
As I argue in Section 2.2, it is not clear that by implementing CEV, s-risks would be prevented for certain. Rather, there is a non-negligible chance that they are not. If you want to argue that s-risks would be prevented for certain, please address the object-level arguments I present. If you want to argue that the occurrence of s-risks would not be bad, you want to argue for a particular view in normative and practical ethics. As a result, you should argue for it presenting arguments to justify certain views in these disciplines.
You don’t justify why this is a bad thing over and above human values as represented in CEV.
This seems to be the major point of disagreement. In the paper, when I say s-risks are morally undesirable, i.e. bad, I use bad and morally undesirable as it is commonly used in analytic philosophy, and outside academia, when for example someone, says “Hey, you can’t do that, that’s wrong”.
What exactly I, you or anyone else mean when we utter the words “bad”, “wrong”, and “morally undesirable” is the main question in the field of meta-ethics. Meta-ethics is very difficult and contrary to what you suggest, I do not reject/disclaim moral realism, neither in the paper nor in my belief system. But I also do not endorse it. I am agnostic regarding this central question in meta-ethics, I suspend my judgment because I believe I have not sufficiently familiarized myself yet with the various arguments in favour or against the various possible positions. See: https://plato.stanford.edu/entries/metaethics/
This paper is not about metaethics, it is about practical ethics, and some normative ethics. It is possible to do both practical ethics and normative ethics while being agnostic or not being correct about metaethics, as is exemplified by the whole academic fields of practical and normative ethics. In the same way that it is possible to attain knowledge about physics, for instance, without having a complete theory of what knowledge is.
If you want, you can try to show that my paper that talks about normative ethics is incorrect based on considerations regarding metaethics but to do so, it would be quite helpful if you were able to present an argument with premises and a conclusion, instead of asking questions.
Thank you for specifically citing passages of the paper in your comment.
If your view is that you only have reasons to include those, whom you have instrumental reasons to include, on your view: the members of an AGI lab that developed ASI ought to include only themselves if they believe (in expectation) that they can successfully do so. This view is implausible, it is implausible that this is what they would have most moral reasons to do.
I note that not everyone considers that implausible, for example Tamsin Leake’s QACI takes this view.
I disagree with both Tamsin Leake and with you: I think that humans-only, but only humans, makes the most sense. But for concrete reasons, not for free-floating moral reasons.
I was writing the following as a response to NicholasKees’ comment, but I think it belongs better as a response here:
...imagine you are in a mob in such a “tyranny of the mob” kind of situation, with mob-CEV. For the time being, imagine a small mob.
You tell the other mob members: “we should expand the franchise/function to other people not in our mob”.
OK, should the other mob members agree?
maybe they agree with you that it is right that the function should be expanded to other humans. In which case mob-CEV would do it automatically.
Or they don’t agree. And still don’t agree after full consideration/extrapolation.
If they don’t agree, what do you do? Ask Total-Utility-God to strike them down for disobeying the One True Morality?
At this point you are stuck, if the mob-CEV AI has made the mob untouchable to entities outside it.
But there is something you could have done earlier. Earlier, you could have allied with other humans outside of the mob, to pressure the would-be-mob members to pre-commit to not excluding other humans.
And in doing so, you might have insisted on including all humans, not specifically the humans you were explicitly allying with, even if you didn’t directly care about everyone, because:
the ally group might shift over time, or people outside the ally group might make their own demands
if the franchise is not set to a solid Schelling point (like all humans) then people currently inside might still worry about the lines being shifted to exclude them.
Thus, you include the Sentinelese, not because you’re worried about them coming over to demand to be included, but because if you draw the line to exclude them then it becomes more ambiguous where the line should be drawn, and relatively low (but non-zero) influence members of the coalition might be worried about also being excluded. And, as fellow humans, it is probably relatively low cost to include them—they’re unlikely to have wildly divergent values or be utility monsters etc.
You might ask, is it not also a solid Schelling point to include all entities whatsoever?
First, not really, we don’t have good definitions of “all sentient beings”, not nearly as good as “all humans”. It might be different if, e.g., we had time travel, such that we would also have to worry about intermediate evolutionary steps between humans and non-human-animals, but we don’t.
In the future, we will have more ambiguous cases, but CEV can handle it. If someone wants to modify themselves into a utility monster, maybe we would want to let them do so, but discount their weighting in CEV to a more normal level when they do it.
And second, it is not costless to expand the franchise. If you allow non-humans preemptively you are opening yourself up to, as an example, the xenophobic aliens scenario, but also potentially who-knows-what other dangerous situations since entities could have arbitrary values.
And that’s why expanding the franchise to all humans makes sense, even if individuals don’t care about other humans that much, but expanding to all sentients does not, even if people do care about other sentients.
In response to the rest of your comment:
If you want to argue that s-risks would be prevented for certain, please address the object-level arguments I present.
If humans would want to prevent s-risks, then they would be prevented. If humans would not want to prevent s-risks, they would not be prevented.
If you want to argue that the occurrence of s-risks would not be bad, you want to argue for a particular view in normative and practical ethics.
You’re the one arguing that people should override their actual values, and instead of programming an AI to follow their actual values, do something else! Without even an instrumental reason to do so (other than alleged moral considerations that aren’t in their actual values, but coming from some other magical direction)!
Asking someone to do something that isn’t in their values, without giving them instrumental reasons to do so, makes no sense.
It is you who needs a strong meta-ethical case for that. It shouldn’t be the objector who has to justify not overriding their values!
First, it seems plausible that, we (in fact) do not have instrumental reason to include all humans. As I argue in section 4.2. There are some humans such as: ” children, existing people who’ve never heard about AI or people with severe physical or cognitive disabilities unable to act on and express their own views on the topic” who, if included, would also only be included in because of our terminal goals, because they too matter.
If your view is that you only have reasons to include those, whom you have instrumental reasons to include, on your view: the members of an AGI lab that developed ASI ought to include only themselves if they believe (in expectation) that they can successfully do so. This view is implausible, it is implausible that this is what they would have most moral reasons to do.
Whether this is implausible or not is a discussion about normative and practical ethics, and (a bit contrary, to what you seem to believe) these kinds of discussions can be had, are had all the time inside and outside academia and are fruitful in many instances.
As I argue in Section 2.2, it is not clear that by implementing CEV, s-risks would be prevented for certain. Rather, there is a non-negligible chance that they are not. If you want to argue that s-risks would be prevented for certain, please address the object-level arguments I present. If you want to argue that the occurrence of s-risks would not be bad, you want to argue for a particular view in normative and practical ethics. As a result, you should argue for it presenting arguments to justify certain views in these disciplines.
This seems to be the major point of disagreement. In the paper, when I say s-risks are morally undesirable, i.e. bad, I use bad and morally undesirable as it is commonly used in analytic philosophy, and outside academia, when for example someone, says “Hey, you can’t do that, that’s wrong”.
What exactly I, you or anyone else mean when we utter the words “bad”, “wrong”, and “morally undesirable” is the main question in the field of meta-ethics. Meta-ethics is very difficult and contrary to what you suggest, I do not reject/disclaim moral realism, neither in the paper nor in my belief system. But I also do not endorse it. I am agnostic regarding this central question in meta-ethics, I suspend my judgment because I believe I have not sufficiently familiarized myself yet with the various arguments in favour or against the various possible positions. See: https://plato.stanford.edu/entries/metaethics/
This paper is not about metaethics, it is about practical ethics, and some normative ethics. It is possible to do both practical ethics and normative ethics while being agnostic or not being correct about metaethics, as is exemplified by the whole academic fields of practical and normative ethics. In the same way that it is possible to attain knowledge about physics, for instance, without having a complete theory of what knowledge is.
If you want, you can try to show that my paper that talks about normative ethics is incorrect based on considerations regarding metaethics but to do so, it would be quite helpful if you were able to present an argument with premises and a conclusion, instead of asking questions.
Thank you for specifically citing passages of the paper in your comment.
I note that not everyone considers that implausible, for example Tamsin Leake’s QACI takes this view.
I disagree with both Tamsin Leake and with you: I think that humans-only, but only humans, makes the most sense. But for concrete reasons, not for free-floating moral reasons.
I was writing the following as a response to NicholasKees’ comment, but I think it belongs better as a response here:
...imagine you are in a mob in such a “tyranny of the mob” kind of situation, with mob-CEV. For the time being, imagine a small mob.
You tell the other mob members: “we should expand the franchise/function to other people not in our mob”.
OK, should the other mob members agree?
maybe they agree with you that it is right that the function should be expanded to other humans. In which case mob-CEV would do it automatically.
Or they don’t agree. And still don’t agree after full consideration/extrapolation.
If they don’t agree, what do you do? Ask Total-Utility-God to strike them down for disobeying the One True Morality?
At this point you are stuck, if the mob-CEV AI has made the mob untouchable to entities outside it.
But there is something you could have done earlier. Earlier, you could have allied with other humans outside of the mob, to pressure the would-be-mob members to pre-commit to not excluding other humans.
And in doing so, you might have insisted on including all humans, not specifically the humans you were explicitly allying with, even if you didn’t directly care about everyone, because:
the ally group might shift over time, or people outside the ally group might make their own demands
if the franchise is not set to a solid Schelling point (like all humans) then people currently inside might still worry about the lines being shifted to exclude them.
Thus, you include the Sentinelese, not because you’re worried about them coming over to demand to be included, but because if you draw the line to exclude them then it becomes more ambiguous where the line should be drawn, and relatively low (but non-zero) influence members of the coalition might be worried about also being excluded. And, as fellow humans, it is probably relatively low cost to include them—they’re unlikely to have wildly divergent values or be utility monsters etc.
You might ask, is it not also a solid Schelling point to include all entities whatsoever?
First, not really, we don’t have good definitions of “all sentient beings”, not nearly as good as “all humans”. It might be different if, e.g., we had time travel, such that we would also have to worry about intermediate evolutionary steps between humans and non-human-animals, but we don’t.
In the future, we will have more ambiguous cases, but CEV can handle it. If someone wants to modify themselves into a utility monster, maybe we would want to let them do so, but discount their weighting in CEV to a more normal level when they do it.
And second, it is not costless to expand the franchise. If you allow non-humans preemptively you are opening yourself up to, as an example, the xenophobic aliens scenario, but also potentially who-knows-what other dangerous situations since entities could have arbitrary values.
And that’s why expanding the franchise to all humans makes sense, even if individuals don’t care about other humans that much, but expanding to all sentients does not, even if people do care about other sentients.
In response to the rest of your comment:
If humans would want to prevent s-risks, then they would be prevented. If humans would not want to prevent s-risks, they would not be prevented.
You’re the one arguing that people should override their actual values, and instead of programming an AI to follow their actual values, do something else! Without even an instrumental reason to do so (other than alleged moral considerations that aren’t in their actual values, but coming from some other magical direction)!
Asking someone to do something that isn’t in their values, without giving them instrumental reasons to do so, makes no sense.
It is you who needs a strong meta-ethical case for that. It shouldn’t be the objector who has to justify not overriding their values!