Hmm, I’m not sure I fully understand the concept of “X statements” you’re trying to introduce, though it does feel similar in some ways to contractualist reasoning. Since the concept is still pretty vague to me, I don’t feel like I can say much about it, beyond mentioning several ideas / concepts that might be related:
- Immanent critique (a way of pointing out the contradictions in existing systems / rules) - Reasons for action (especially justificatory reasons) - Moral naturalism (the meta-ethical position that moral statements are statements about the natural world)
Thank you! Sorry, I should have formulated my question better.
I meant that from time to time people come up with the idea “maybe AI shouldn’t learn human values/ethics in the classical sense” or “maybe learning something that’s not human values can help to learn human values”:
Impact measures. “Impact” by itself is not a human value. It exists beyond human values.
Your idea of contractualism. “Contracts” are not human values in the classical sense. You say that human values make sense only in context of society and a specific reality.
Normativity by abramdemski. “Normativity” is not 100% about human values: for example, there’s normativity in language.
My idea: describe values/ethics as a system and study it in the context of all other systems.
The common theme of all those ideas is describing human values as a part of something bigger. I thought it would be rational to give a name to this entire area “beyond human values” and compare ideas in that context. And answer the question: why do we even bother going there? what can we gain there in the perfect case? (Any approach in theory can be replaced by a very long list of direct instructions, but we look for something more convenient than “direct instructions”.) Maybe we should try to answer those questions in general before trying to justify specific approaches. And I think there shouldn’t be a conflict between different approaches: different approaches can share results and be combined in various ways.
What do you think about that whole area “beyond human values”?
Yes, the summary you gave above checks out with what I took away from your post. I think it sounds good on a high level, but still too vague / high-level for me to say much in more detail. Values/ethics are definitely a system (e.g., one might think that morality was evolved by humans for the purposes of co-operation), but at the end of the day you’re going to have to make some concrete hypothesis about what that system is in order to make progress. Contractualism is one such concrete hypothesis, and folding ethics under the broader scope of normative reasoning is another way to understand the underlying logic of ethical reasoning. Moral naturalism is another way of going “beyond human values”, because it argues that statements about ethics can be reduced to statements about the natural world.
In any case, I think your idea (and abramdemski’s) should be getting more attention.
Values/ethics are definitely a system (e.g., one might think that morality was evolved by humans for the purposes of co-operation), but at the end of the day you’re going to have to make some concrete hypothesis about what that system is in order to make progress. Contractualism is one such concrete hypothesis, and folding ethics under the broader scope of normative reasoning is another way to understand the underlying logic of ethical reasoning.
I think the next possible step, before trying to guess a specific system/formalization, is to ask “what can we possibly gain by generalizing?”
For example, if you generalize values to normativity (including language normativity):
You may translate the process of learning language into the process of learning human values. You can test alignment of the AI on language.
And maybe you can even translate some rules of language normativity into the rules of human normativity.
I speculated that if you generalize values to statements about systems, then:
You can translate some statements about simpler systems into statements about human values. You get simple, but universal justifications for actions.
You get very “dense” justifications of actions. E.g. you have a very big amount of overlapping reasons to not turn the world into paperclips.
You get very “recursive” justifications. “Recursivness” means how easy it is to derive/reconstruct one value from another.
What do we gain (in the best case scenario) by generalizing values to “contracts”? I thought that maybe we could discuss what possible properties this generalization may have. Finding an additional property you want to get out of the generalization may help with the formalization (it can restrict the space of possible formal models).
Moral naturalism is another way of going “beyond human values”, because it argues that statements about ethics can be reduced to statements about the natural world.
It’s not a very useful generalization/reduction if we don’t get anything from it, if “statements about the natural world” don’t have significant convenient properties.
Hmm, I’m not sure I fully understand the concept of “X statements” you’re trying to introduce, though it does feel similar in some ways to contractualist reasoning. Since the concept is still pretty vague to me, I don’t feel like I can say much about it, beyond mentioning several ideas / concepts that might be related:
- Immanent critique (a way of pointing out the contradictions in existing systems / rules)
- Reasons for action (especially justificatory reasons)
- Moral naturalism (the meta-ethical position that moral statements are statements about the natural world)
Thank you! Sorry, I should have formulated my question better.
I meant that from time to time people come up with the idea “maybe AI shouldn’t learn human values/ethics in the classical sense” or “maybe learning something that’s not human values can help to learn human values”:
Impact measures. “Impact” by itself is not a human value. It exists beyond human values.
Your idea of contractualism. “Contracts” are not human values in the classical sense. You say that human values make sense only in context of society and a specific reality.
Normativity by abramdemski. “Normativity” is not 100% about human values: for example, there’s normativity in language.
My idea: describe values/ethics as a system and study it in the context of all other systems.
The common theme of all those ideas is describing human values as a part of something bigger. I thought it would be rational to give a name to this entire area “beyond human values” and compare ideas in that context. And answer the question: why do we even bother going there? what can we gain there in the perfect case? (Any approach in theory can be replaced by a very long list of direct instructions, but we look for something more convenient than “direct instructions”.) Maybe we should try to answer those questions in general before trying to justify specific approaches. And I think there shouldn’t be a conflict between different approaches: different approaches can share results and be combined in various ways.
What do you think about that whole area “beyond human values”?
Apologies for the belated reply.
Yes, the summary you gave above checks out with what I took away from your post. I think it sounds good on a high level, but still too vague / high-level for me to say much in more detail. Values/ethics are definitely a system (e.g., one might think that morality was evolved by humans for the purposes of co-operation), but at the end of the day you’re going to have to make some concrete hypothesis about what that system is in order to make progress. Contractualism is one such concrete hypothesis, and folding ethics under the broader scope of normative reasoning is another way to understand the underlying logic of ethical reasoning. Moral naturalism is another way of going “beyond human values”, because it argues that statements about ethics can be reduced to statements about the natural world.
Hopefully this is helpful food for thought!
In any case, I think your idea (and abramdemski’s) should be getting more attention.
I think the next possible step, before trying to guess a specific system/formalization, is to ask “what can we possibly gain by generalizing?”
For example, if you generalize values to normativity (including language normativity):
You may translate the process of learning language into the process of learning human values. You can test alignment of the AI on language.
And maybe you can even translate some rules of language normativity into the rules of human normativity.
I speculated that if you generalize values to statements about systems, then:
You can translate some statements about simpler systems into statements about human values. You get simple, but universal justifications for actions.
You get very “dense” justifications of actions. E.g. you have a very big amount of overlapping reasons to not turn the world into paperclips.
You get very “recursive” justifications. “Recursivness” means how easy it is to derive/reconstruct one value from another.
What do we gain (in the best case scenario) by generalizing values to “contracts”? I thought that maybe we could discuss what possible properties this generalization may have. Finding an additional property you want to get out of the generalization may help with the formalization (it can restrict the space of possible formal models).
It’s not a very useful generalization/reduction if we don’t get anything from it, if “statements about the natural world” don’t have significant convenient properties.