Distinguish formal preference and likes. Formal preference is like prior: both current beliefs and procedure for updating the beliefs; beliefs change, but not the procedure. Likes are like beliefs: they change all the time, according to formal preference, in response to observations and reflection. Of course, we might consider jumping to a meta level, where the procedure for updating beliefs is itself subject to revision; this doesn’t really change the game, you’ve just named some of the beliefs changing according to fixed prior “object-level priors”, and named the process of revising those beliefs according to the fixed prior “process of changing object-level prior”.
When formal preference changes, it by definition means that it changed not according to (former) formal preference, that is something undesirable happened. Humans are not able to hold their preference fixed, which means that their preferences do change, what I call “value drift”.
You are locked in in some preference in normative sense, not factual. This means that value drift does change your preference, but it is actually desirable (for you) for your formal preference to never change.
Formal preference is like prior: both current beliefs and procedure for updating the beliefs; beliefs change, but not the procedure.
I object to your talking about “formal preference” without having a formal definition. Until you invent one, please let’s talk about what normal humans mean by “preference” instead.
I’m trying to find a formal understanding of a certain concept, and this concept is not what is normally called “preference”, as in “likes”. To distinguish from the word “preference”, I used the label “formal preference” in the above comment to refer to this concept I don’t fully understand. Maybe the adjective “formal” is inappropriate for something I can’t formally define, but it’s not an option to talk about a different concept, as I’m not interested in a different concept. Hence I’m confused about what you are really suggesting by
Until you invent one, please let’s talk about what normal humans mean by “preference” instead.
For the purposes of FAI, what I’m discussing as “formal preference”, which is the same as “morality”, is clearly more important than likes.
I’d be willing to bet money that any formalization of “preference” that you invent, short of encoding the whole world into it, will still describe a property that some humans do modify within themselves. So we aren’t locked in, but your AIs will be.
Do humans modify that property, or find it desirable to modify it? The distinction between factual and normative is very important here, since we are talking about preference, the pure normative. If humans prefer different preference from a given one, they do so in some lawful way, according to some preference criterion (that they hold in their minds). All such meta-steps should be included. (Of course, it might prove impossible to formalize in practice.)
As for the “encoding the whole world” part, it’s the ontology problem, and I’m pretty sure that it’s enough to encode preference about strategy (external behavior, given all possible observations) of a given concrete agent, to preserve all of human preference. Preference about external world or the way the agent works on the inside is not required.
Distinguish formal preference and likes. Formal preference is like prior: both current beliefs and procedure for updating the beliefs; beliefs change, but not the procedure. Likes are like beliefs: they change all the time, according to formal preference, in response to observations and reflection. Of course, we might consider jumping to a meta level, where the procedure for updating beliefs is itself subject to revision; this doesn’t really change the game, you’ve just named some of the beliefs changing according to fixed prior “object-level priors”, and named the process of revising those beliefs according to the fixed prior “process of changing object-level prior”.
When formal preference changes, it by definition means that it changed not according to (former) formal preference, that is something undesirable happened. Humans are not able to hold their preference fixed, which means that their preferences do change, what I call “value drift”.
You are locked in in some preference in normative sense, not factual. This means that value drift does change your preference, but it is actually desirable (for you) for your formal preference to never change.
I object to your talking about “formal preference” without having a formal definition. Until you invent one, please let’s talk about what normal humans mean by “preference” instead.
I’m trying to find a formal understanding of a certain concept, and this concept is not what is normally called “preference”, as in “likes”. To distinguish from the word “preference”, I used the label “formal preference” in the above comment to refer to this concept I don’t fully understand. Maybe the adjective “formal” is inappropriate for something I can’t formally define, but it’s not an option to talk about a different concept, as I’m not interested in a different concept. Hence I’m confused about what you are really suggesting by
For the purposes of FAI, what I’m discussing as “formal preference”, which is the same as “morality”, is clearly more important than likes.
I’d be willing to bet money that any formalization of “preference” that you invent, short of encoding the whole world into it, will still describe a property that some humans do modify within themselves. So we aren’t locked in, but your AIs will be.
Do humans modify that property, or find it desirable to modify it? The distinction between factual and normative is very important here, since we are talking about preference, the pure normative. If humans prefer different preference from a given one, they do so in some lawful way, according to some preference criterion (that they hold in their minds). All such meta-steps should be included. (Of course, it might prove impossible to formalize in practice.)
As for the “encoding the whole world” part, it’s the ontology problem, and I’m pretty sure that it’s enough to encode preference about strategy (external behavior, given all possible observations) of a given concrete agent, to preserve all of human preference. Preference about external world or the way the agent works on the inside is not required.