Why “moral nihilism—the assumption that no moral facts exist—implies the impossibility of building aligned AGI”?
I personally think that alignment is impossible because, shortly speaking, “humans don’t have values”—they have complex unstable behaviour, which could be simplified by using idea of preferences, but there is no actual preferences.
This two claims seems similar: non-existence of values and non-existence of moral facts, but are they actually the same claim?
This is an interesting point to bring up, and maybe one I’ll find a way to explore more.
You’re right that there is a sense in which humans don’t seem to have preferences or values because both are fuzzy categories that draw arbitrary distinctions. The best way to rescue them seems to be to think of values as thoughts that meet at least the following criteria:
orderable (though not in practice totally or even partially or pre orderable)
about experiences (to use your framing, perhaps about possible behaviors)
There might be others but I think that’s enough to let us carve up the space of things that are candidate behaviors such that we exclude the things that are not values and include the things that are. Also this gives a rather odd version of what preferences/values are because it eliminates the normative assumptions normally made when talking about preferences in order to better reflect the “unstable behavior” of humans. I still find value in talking about values, though, because it gives a way to distinguish between the interior experience of value and the exterior experience of acting.
On you’re second point I disagree for the reason just given: there is a sense in which we can talk about values that are non-normative and so avoid the need for moral facts, making them not equivalent. I have, though, neglected to address in this section (or elsewhere in the paper) this take on what we mean by values and made sure it’s clear I’m holding it separate from some other discussions about values/preferences where normativity is sought or assumed.
I have started to write longer text where I will explore the idea of the non-existence of values, and I expect to have ready to present version in 2-3 months, and will share it on LW, so we then could again compare our views on this topic. In it I will adress 3 different ways who we coul learn persons values: 1) his action 2) his emotion 3) his external claims—which could be different from his thoughts about his values, so your idea may be the 4th level.
Great, I look forward to it! That’s a topic I’ve been somewhat unwilling to tackle just now because I’ve identified what I’m working on now as crucial to other lines of thinking I want to explore but consider it crucial to how we’re going to address problems we face in designing alignment schemes.
Why “moral nihilism—the assumption that no moral facts exist—implies the impossibility of building aligned AGI”?
I personally think that alignment is impossible because, shortly speaking, “humans don’t have values”—they have complex unstable behaviour, which could be simplified by using idea of preferences, but there is no actual preferences.
This two claims seems similar: non-existence of values and non-existence of moral facts, but are they actually the same claim?
This is an interesting point to bring up, and maybe one I’ll find a way to explore more.
You’re right that there is a sense in which humans don’t seem to have preferences or values because both are fuzzy categories that draw arbitrary distinctions. The best way to rescue them seems to be to think of values as thoughts that meet at least the following criteria:
orderable (though not in practice totally or even partially or pre orderable)
about experiences (to use your framing, perhaps about possible behaviors)
There might be others but I think that’s enough to let us carve up the space of things that are candidate behaviors such that we exclude the things that are not values and include the things that are. Also this gives a rather odd version of what preferences/values are because it eliminates the normative assumptions normally made when talking about preferences in order to better reflect the “unstable behavior” of humans. I still find value in talking about values, though, because it gives a way to distinguish between the interior experience of value and the exterior experience of acting.
On you’re second point I disagree for the reason just given: there is a sense in which we can talk about values that are non-normative and so avoid the need for moral facts, making them not equivalent. I have, though, neglected to address in this section (or elsewhere in the paper) this take on what we mean by values and made sure it’s clear I’m holding it separate from some other discussions about values/preferences where normativity is sought or assumed.
I have started to write longer text where I will explore the idea of the non-existence of values, and I expect to have ready to present version in 2-3 months, and will share it on LW, so we then could again compare our views on this topic. In it I will adress 3 different ways who we coul learn persons values: 1) his action 2) his emotion 3) his external claims—which could be different from his thoughts about his values, so your idea may be the 4th level.
Great, I look forward to it! That’s a topic I’ve been somewhat unwilling to tackle just now because I’ve identified what I’m working on now as crucial to other lines of thinking I want to explore but consider it crucial to how we’re going to address problems we face in designing alignment schemes.