My position is something like “I haven’t yet seen anyone compellingly both define and argue for moral realism, so until then the whole notion seems confused to me”.
It is unclear to me what it would even mean for a moral claim to actually be objectively true or false. At the same time, there are many evolutionary and game-theoretical reasons for why various moral claims would feel objectively true or false to human minds, and that seems sufficient for explaining why many people have an intuition of moral realism being true. I have also personally found some of my moral beliefs changing as a result of psychological work—see the second example here—which makes me further inclined to believe that moral beliefs are all psychological (and thus subjective, as I understand the term).
So my argument is simply that there doesn’t seem to be any reason for me to believe in moral realism, somewhat analogous to how there doesn’t seem to be any reason for me to believe in a supernatural God.
I think a simpler way to state the objection is to say that “value” and “meaning” are transitive verbs. I can value money; Steve can value cars; Mike can value himself. It’s not clear what it would even mean for objective reality to value something. Similarly, a subject may “mean” a referent to an interpreter, but nothing can just “mean” or even “mean something” without an implicit interpreter, and “objective reality” doesn’t seem to be the sort of thing that can interpret.
I guess you could posit natural selection as being objective reality’s value system, but I have the feeling that’s not the kind of thing moral realists have in mind.
Indeed. A certain coronavirus has recently achieved remarkable gains in Darwinist terms, but this is not generally considered a moral triumph. Quite the opposite, as a dislike for disease is a near-universal human value.
It is often tempting to use near-universal human values as a substitute for objective values, and sometimes it works. However, such values are not always internally consistent because humanity isn’t. Values such as disease prevention came into conflict with other values such as prosperity during the pandemic, with some people supporting strict lockdowns and others supporting a return to business as usual.
And there are words such as “justice” which refer to ostensibly near-universal human values except people don’t always agree on what that value is or what it demands in any specific case.
1 There is a procedure/algorithm which doesn’t seem biased towards a particular value system such that a class of AI systems that implement it end up having a common set of values, and they endorse the same values upon reflection.
2 This set of values might have something in common with what we, humans, call values.
If 1 and 2 seem at least plausible or conceivable, why can’t we use them as a basis to design aligned AI? Is it because of skepticism towards 1 or 2?
It seems very hard for me to imagine how one could create a procedure that wasn’t biased towards a particular value system. E.g. Stuart Armstrong has written about how humans can be assigned any values whatsoever—you have to decide what parts of their behavior are because of genuine preferences and what parts are because of irrationality, and what values that implies. And the way you decide what’s correct behavior and what’s irrationality seems like the kind of a choice that will depend on your own values. Even something like “this seems like the simplest way of assigning preferences” presupposes that it is valuable to pick a procedure based on its simplicity—though the post argues that even simplicity would fail to distinguish between several alternative ways of assigning preferences.
Of course, just because we can’t be truly unbiased doesn’t mean we couldn’t be less biased, so maybe something like “pick the simplest system that produces sensible agents, distinguishing between ties at random” could arguably be the least biased alternative. But human values seem quite complex; if there was some simple and unbiased solution that would produce convergent values to all AIs that implemented it, it might certainly have something in common with what we call values, but that’s not a very high bar. There’s a sense in which all the bacteria share the same goal, “making more (surviving) copies of yourself is the only thing that matters”, and I’d expect the convergent value system to end up as being something like that. That has some resemblance to human values, since many humans also care about having offspring, but not very much.
My position is something like “I haven’t yet seen anyone compellingly both define and argue for moral realism, so until then the whole notion seems confused to me”.
It is unclear to me what it would even mean for a moral claim to actually be objectively true or false. At the same time, there are many evolutionary and game-theoretical reasons for why various moral claims would feel objectively true or false to human minds, and that seems sufficient for explaining why many people have an intuition of moral realism being true. I have also personally found some of my moral beliefs changing as a result of psychological work—see the second example here—which makes me further inclined to believe that moral beliefs are all psychological (and thus subjective, as I understand the term).
So my argument is simply that there doesn’t seem to be any reason for me to believe in moral realism, somewhat analogous to how there doesn’t seem to be any reason for me to believe in a supernatural God.
I think a simpler way to state the objection is to say that “value” and “meaning” are transitive verbs. I can value money; Steve can value cars; Mike can value himself. It’s not clear what it would even mean for objective reality to value something. Similarly, a subject may “mean” a referent to an interpreter, but nothing can just “mean” or even “mean something” without an implicit interpreter, and “objective reality” doesn’t seem to be the sort of thing that can interpret.
I guess you could posit natural selection as being objective reality’s value system, but I have the feeling that’s not the kind of thing moral realists have in mind.
Indeed. A certain coronavirus has recently achieved remarkable gains in Darwinist terms, but this is not generally considered a moral triumph. Quite the opposite, as a dislike for disease is a near-universal human value.
It is often tempting to use near-universal human values as a substitute for objective values, and sometimes it works. However, such values are not always internally consistent because humanity isn’t. Values such as disease prevention came into conflict with other values such as prosperity during the pandemic, with some people supporting strict lockdowns and others supporting a return to business as usual.
And there are words such as “justice” which refer to ostensibly near-universal human values except people don’t always agree on what that value is or what it demands in any specific case.
How do you feel about:
1
There is
a procedure/algorithm which doesn’t seem biased towards a particular value system
such that
a class of AI systems that implement it end up having a common set of values, and they endorse the same values upon reflection.
2
This set of values might have something in common with what we, humans, call values.
If 1 and 2 seem at least plausible or conceivable, why can’t we use them as a basis to design aligned AI? Is it because of skepticism towards 1 or 2?
The “might” in 2. Implies a “might not”.
It seems very hard for me to imagine how one could create a procedure that wasn’t biased towards a particular value system. E.g. Stuart Armstrong has written about how humans can be assigned any values whatsoever—you have to decide what parts of their behavior are because of genuine preferences and what parts are because of irrationality, and what values that implies. And the way you decide what’s correct behavior and what’s irrationality seems like the kind of a choice that will depend on your own values. Even something like “this seems like the simplest way of assigning preferences” presupposes that it is valuable to pick a procedure based on its simplicity—though the post argues that even simplicity would fail to distinguish between several alternative ways of assigning preferences.
Of course, just because we can’t be truly unbiased doesn’t mean we couldn’t be less biased, so maybe something like “pick the simplest system that produces sensible agents, distinguishing between ties at random” could arguably be the least biased alternative. But human values seem quite complex; if there was some simple and unbiased solution that would produce convergent values to all AIs that implemented it, it might certainly have something in common with what we call values, but that’s not a very high bar. There’s a sense in which all the bacteria share the same goal, “making more (surviving) copies of yourself is the only thing that matters”, and I’d expect the convergent value system to end up as being something like that. That has some resemblance to human values, since many humans also care about having offspring, but not very much.