In any case, I think your idea (and abramdemski’s) should be getting more attention.
Values/ethics are definitely a system (e.g., one might think that morality was evolved by humans for the purposes of co-operation), but at the end of the day you’re going to have to make some concrete hypothesis about what that system is in order to make progress. Contractualism is one such concrete hypothesis, and folding ethics under the broader scope of normative reasoning is another way to understand the underlying logic of ethical reasoning.
I think the next possible step, before trying to guess a specific system/formalization, is to ask “what can we possibly gain by generalizing?”
For example, if you generalize values to normativity (including language normativity):
You may translate the process of learning language into the process of learning human values. You can test alignment of the AI on language.
And maybe you can even translate some rules of language normativity into the rules of human normativity.
I speculated that if you generalize values to statements about systems, then:
You can translate some statements about simpler systems into statements about human values. You get simple, but universal justifications for actions.
You get very “dense” justifications of actions. E.g. you have a very big amount of overlapping reasons to not turn the world into paperclips.
You get very “recursive” justifications. “Recursivness” means how easy it is to derive/reconstruct one value from another.
What do we gain (in the best case scenario) by generalizing values to “contracts”? I thought that maybe we could discuss what possible properties this generalization may have. Finding an additional property you want to get out of the generalization may help with the formalization (it can restrict the space of possible formal models).
Moral naturalism is another way of going “beyond human values”, because it argues that statements about ethics can be reduced to statements about the natural world.
It’s not a very useful generalization/reduction if we don’t get anything from it, if “statements about the natural world” don’t have significant convenient properties.
In any case, I think your idea (and abramdemski’s) should be getting more attention.
I think the next possible step, before trying to guess a specific system/formalization, is to ask “what can we possibly gain by generalizing?”
For example, if you generalize values to normativity (including language normativity):
You may translate the process of learning language into the process of learning human values. You can test alignment of the AI on language.
And maybe you can even translate some rules of language normativity into the rules of human normativity.
I speculated that if you generalize values to statements about systems, then:
You can translate some statements about simpler systems into statements about human values. You get simple, but universal justifications for actions.
You get very “dense” justifications of actions. E.g. you have a very big amount of overlapping reasons to not turn the world into paperclips.
You get very “recursive” justifications. “Recursivness” means how easy it is to derive/reconstruct one value from another.
What do we gain (in the best case scenario) by generalizing values to “contracts”? I thought that maybe we could discuss what possible properties this generalization may have. Finding an additional property you want to get out of the generalization may help with the formalization (it can restrict the space of possible formal models).
It’s not a very useful generalization/reduction if we don’t get anything from it, if “statements about the natural world” don’t have significant convenient properties.