Large part of what we call “moral” preferences are meta-preferences about how values of different people should be combined. For example, the freedom of speech is a preference (or norm) about how the values of different people about saying different things should be integrated. In general case, such “freedom of speech” is content free, but in real life there are situations when the preferences of a person A are so contradicting to the preferences of the person B, so the freedom of speech has to limited (e.g. hate speech, blackmail, loo long speech, defamation).
Below I will quote a few paragraphs on the topic which I wrote recently for the draft of human values.
---
As large part of human values are preferences about other people preferences, they mutually exclude each other. E.g.: {I want “X loves me”, but X don’t want to be influenced by other’s desires}. Such situation is typical in ordinary life, but if such values are scaled and extrapolated, one side should be chosen: either I will win, or X.
To escape such situation, something like Kantian moral low, Categorical Imperative, should be used as a metal-value, which basically regulate how other’s people values relate to each other:
Act only according to that maxim by which you can at the same time will that it should become a universal law.
In other words, Categorical Imperative is something like “updateless decision theory” in which you choose a policy without updating on your local position, so if everybody will use this principle, they will come to the same policy. (See comparison of different decision theirs developed by LessWrong community here.)
From the Categorical Imperative could be derived some human values like: it is bad to kill other people, as one doesn’t want to be killed. However, the main thing is that such meta-level principle of relation between values of different people can’t be derived just from observation of a single person.
Moreover, most ethical principles are describing interpersonal relations, so they are not about personal values, but about the ways how values of different people should interact. The things like Categorical imperative can’t be learned from observation; but they also can’t be deduced based on pure logic, so they can’t be called “true” or “false”.
In other words, AI learning human values can’t learn meta-ethical principles like Categorical imperative nor it can’t deduce them from pure math. That is why we should provide AI with correct decision theory, but it is not clear why “correct theory” should exist at all.
This could also be called meta-ethical normative assumption: some high level ethical principles which can’t be deduced from observations.
Large part of what we call “moral” preferences are meta-preferences about how values of different people should be combined. For example, the freedom of speech is a preference (or norm) about how the values of different people about saying different things should be integrated. In general case, such “freedom of speech” is content free, but in real life there are situations when the preferences of a person A are so contradicting to the preferences of the person B, so the freedom of speech has to limited (e.g. hate speech, blackmail, loo long speech, defamation).
Below I will quote a few paragraphs on the topic which I wrote recently for the draft of human values.
---
As large part of human values are preferences about other people preferences, they mutually exclude each other. E.g.: {I want “X loves me”, but X don’t want to be influenced by other’s desires}. Such situation is typical in ordinary life, but if such values are scaled and extrapolated, one side should be chosen: either I will win, or X.
To escape such situation, something like Kantian moral low, Categorical Imperative, should be used as a metal-value, which basically regulate how other’s people values relate to each other:
Act only according to that maxim by which you can at the same time will that it should become a universal law.
In other words, Categorical Imperative is something like “updateless decision theory” in which you choose a policy without updating on your local position, so if everybody will use this principle, they will come to the same policy. (See comparison of different decision theirs developed by LessWrong community here.)
From the Categorical Imperative could be derived some human values like: it is bad to kill other people, as one doesn’t want to be killed. However, the main thing is that such meta-level principle of relation between values of different people can’t be derived just from observation of a single person.
Moreover, most ethical principles are describing interpersonal relations, so they are not about personal values, but about the ways how values of different people should interact. The things like Categorical imperative can’t be learned from observation; but they also can’t be deduced based on pure logic, so they can’t be called “true” or “false”.
In other words, AI learning human values can’t learn meta-ethical principles like Categorical imperative nor it can’t deduce them from pure math. That is why we should provide AI with correct decision theory, but it is not clear why “correct theory” should exist at all.
This could also be called meta-ethical normative assumption: some high level ethical principles which can’t be deduced from observations.