Distinguish formal preference and likes. Formal preference is like prior: both current beliefs and procedure for updating the beliefs; beliefs change, but not the procedure. Likes are like beliefs: they change all the time, according to formal preference, in response to observations and reflection. Of course, we might consider jumping to a meta level, where the procedure for updating beliefs is itself subject to revision; this doesn’t really change the game, you’ve just named some of the beliefs changing according to fixed prior “object-level priors”, and named the process of revising those beliefs according to the fixed prior “process of changing object-level prior”.
When formal preference changes, it by definition means that it changed not according to (former) formal preference, that is something undesirable happened. Humans are not able to hold their preference fixed, which means that their preferences do change, what I call “value drift”.
You are locked in in some preference in normative sense, not factual. This means that value drift does change your preference, but it is actually desirable (for you) for your formal preference to never change.
Formal preference is like prior: both current beliefs and procedure for updating the beliefs; beliefs change, but not the procedure.
I object to your talking about “formal preference” without having a formal definition. Until you invent one, please let’s talk about what normal humans mean by “preference” instead.
I’m trying to find a formal understanding of a certain concept, and this concept is not what is normally called “preference”, as in “likes”. To distinguish from the word “preference”, I used the label “formal preference” in the above comment to refer to this concept I don’t fully understand. Maybe the adjective “formal” is inappropriate for something I can’t formally define, but it’s not an option to talk about a different concept, as I’m not interested in a different concept. Hence I’m confused about what you are really suggesting by
Until you invent one, please let’s talk about what normal humans mean by “preference” instead.
For the purposes of FAI, what I’m discussing as “formal preference”, which is the same as “morality”, is clearly more important than likes.
I’d be willing to bet money that any formalization of “preference” that you invent, short of encoding the whole world into it, will still describe a property that some humans do modify within themselves. So we aren’t locked in, but your AIs will be.
Do humans modify that property, or find it desirable to modify it? The distinction between factual and normative is very important here, since we are talking about preference, the pure normative. If humans prefer different preference from a given one, they do so in some lawful way, according to some preference criterion (that they hold in their minds). All such meta-steps should be included. (Of course, it might prove impossible to formalize in practice.)
As for the “encoding the whole world” part, it’s the ontology problem, and I’m pretty sure that it’s enough to encode preference about strategy (external behavior, given all possible observations) of a given concrete agent, to preserve all of human preference. Preference about external world or the way the agent works on the inside is not required.
You’re talking about posteriors. They’re talking about priors, presumably foundational priors that for some reason aren’t posteriors for any computations. An important question is whether such priors exist.
That’s not obvious. You’d need to study many specific cases, and see if starting from different priors reliably predicts the final posteriors. There might be no way to “get there from here” for some priors.
When we speak of the values that an organism has, which are analogous to the priors an organism starts with, it’s routine to speak of the role of the initial values as locking in a value system. Why do we treat these cases differently?
There might be no way to “get there from here” for some priors.
That’s obviously true for priors that initially assign probability zero somewhere. But as Cosma Shalizi loves pointingout, Diaconis and Freedmanhave shown it can happen for more reasonable priors too, where the prior is “maladapted to the data generating process”.
This is of course one of those questionable cases with a lot of infinities being thrown around, and we know that applying Bayesian reasoning with infinities is not on fully solid footing. And much of the discussion is about failure to satisfy Frequentist conditions that many may not care about (though they do have a section arguing we should care). But it is still a very good paper, showing that non-zero probability isn’t quite good enough for some continuous systems.
What makes you say that? It’s not true. My preferences have changed many times.
Distinguish formal preference and likes. Formal preference is like prior: both current beliefs and procedure for updating the beliefs; beliefs change, but not the procedure. Likes are like beliefs: they change all the time, according to formal preference, in response to observations and reflection. Of course, we might consider jumping to a meta level, where the procedure for updating beliefs is itself subject to revision; this doesn’t really change the game, you’ve just named some of the beliefs changing according to fixed prior “object-level priors”, and named the process of revising those beliefs according to the fixed prior “process of changing object-level prior”.
When formal preference changes, it by definition means that it changed not according to (former) formal preference, that is something undesirable happened. Humans are not able to hold their preference fixed, which means that their preferences do change, what I call “value drift”.
You are locked in in some preference in normative sense, not factual. This means that value drift does change your preference, but it is actually desirable (for you) for your formal preference to never change.
I object to your talking about “formal preference” without having a formal definition. Until you invent one, please let’s talk about what normal humans mean by “preference” instead.
I’m trying to find a formal understanding of a certain concept, and this concept is not what is normally called “preference”, as in “likes”. To distinguish from the word “preference”, I used the label “formal preference” in the above comment to refer to this concept I don’t fully understand. Maybe the adjective “formal” is inappropriate for something I can’t formally define, but it’s not an option to talk about a different concept, as I’m not interested in a different concept. Hence I’m confused about what you are really suggesting by
For the purposes of FAI, what I’m discussing as “formal preference”, which is the same as “morality”, is clearly more important than likes.
I’d be willing to bet money that any formalization of “preference” that you invent, short of encoding the whole world into it, will still describe a property that some humans do modify within themselves. So we aren’t locked in, but your AIs will be.
Do humans modify that property, or find it desirable to modify it? The distinction between factual and normative is very important here, since we are talking about preference, the pure normative. If humans prefer different preference from a given one, they do so in some lawful way, according to some preference criterion (that they hold in their minds). All such meta-steps should be included. (Of course, it might prove impossible to formalize in practice.)
As for the “encoding the whole world” part, it’s the ontology problem, and I’m pretty sure that it’s enough to encode preference about strategy (external behavior, given all possible observations) of a given concrete agent, to preserve all of human preference. Preference about external world or the way the agent works on the inside is not required.
What makes you say that Bayesians are locked in? It’s not true. If they’re presented with evidence for or against their beliefs, they’ll change them.
You’re talking about posteriors. They’re talking about priors, presumably foundational priors that for some reason aren’t posteriors for any computations. An important question is whether such priors exist.
But your beliefs are your posteriors, not your priors. If the only thing that’s locked in is your priors, that’s not a locking-in at all.
That’s not obvious. You’d need to study many specific cases, and see if starting from different priors reliably predicts the final posteriors. There might be no way to “get there from here” for some priors.
When we speak of the values that an organism has, which are analogous to the priors an organism starts with, it’s routine to speak of the role of the initial values as locking in a value system. Why do we treat these cases differently?
That’s obviously true for priors that initially assign probability zero somewhere. But as Cosma Shalizi loves pointing out, Diaconis and Freedman have shown it can happen for more reasonable priors too, where the prior is “maladapted to the data generating process”.
This is of course one of those questionable cases with a lot of infinities being thrown around, and we know that applying Bayesian reasoning with infinities is not on fully solid footing. And much of the discussion is about failure to satisfy Frequentist conditions that many may not care about (though they do have a section arguing we should care). But it is still a very good paper, showing that non-zero probability isn’t quite good enough for some continuous systems.