Distinguish formal preference and likes. Formal preference is like prior: both current beliefs and procedure for updating the beliefs; beliefs change, but not the procedure. Likes are like beliefs: they change all the time, according to formal preference, in response to observations and reflection. Of course, we might consider jumping to a meta level, where the procedure for updating beliefs is itself subject to revision; this doesn’t really change the game, you’ve just named some of the beliefs changing according to fixed prior “object-level priors”, and named the process of revising those beliefs according to the fixed prior “process of changing object-level prior”.
When formal preference changes, it by definition means that it changed not according to (former) formal preference, that is something undesirable happened. Humans are not able to hold their preference fixed, which means that their preferences do change, what I call “value drift”.
You are locked in in some preference in normative sense, not factual. This means that value drift does change your preference, but it is actually desirable (for you) for your formal preference to never change.
Formal preference is like prior: both current beliefs and procedure for updating the beliefs; beliefs change, but not the procedure.
I object to your talking about “formal preference” without having a formal definition. Until you invent one, please let’s talk about what normal humans mean by “preference” instead.
I’m trying to find a formal understanding of a certain concept, and this concept is not what is normally called “preference”, as in “likes”. To distinguish from the word “preference”, I used the label “formal preference” in the above comment to refer to this concept I don’t fully understand. Maybe the adjective “formal” is inappropriate for something I can’t formally define, but it’s not an option to talk about a different concept, as I’m not interested in a different concept. Hence I’m confused about what you are really suggesting by
Until you invent one, please let’s talk about what normal humans mean by “preference” instead.
For the purposes of FAI, what I’m discussing as “formal preference”, which is the same as “morality”, is clearly more important than likes.
I’d be willing to bet money that any formalization of “preference” that you invent, short of encoding the whole world into it, will still describe a property that some humans do modify within themselves. So we aren’t locked in, but your AIs will be.
Do humans modify that property, or find it desirable to modify it? The distinction between factual and normative is very important here, since we are talking about preference, the pure normative. If humans prefer different preference from a given one, they do so in some lawful way, according to some preference criterion (that they hold in their minds). All such meta-steps should be included. (Of course, it might prove impossible to formalize in practice.)
As for the “encoding the whole world” part, it’s the ontology problem, and I’m pretty sure that it’s enough to encode preference about strategy (external behavior, given all possible observations) of a given concrete agent, to preserve all of human preference. Preference about external world or the way the agent works on the inside is not required.
You’re talking about posteriors. They’re talking about priors, presumably foundational priors that for some reason aren’t posteriors for any computations. An important question is whether such priors exist.
That’s not obvious. You’d need to study many specific cases, and see if starting from different priors reliably predicts the final posteriors. There might be no way to “get there from here” for some priors.
When we speak of the values that an organism has, which are analogous to the priors an organism starts with, it’s routine to speak of the role of the initial values as locking in a value system. Why do we treat these cases differently?
There might be no way to “get there from here” for some priors.
That’s obviously true for priors that initially assign probability zero somewhere. But as Cosma Shalizi loves pointingout, Diaconis and Freedmanhave shown it can happen for more reasonable priors too, where the prior is “maladapted to the data generating process”.
This is of course one of those questionable cases with a lot of infinities being thrown around, and we know that applying Bayesian reasoning with infinities is not on fully solid footing. And much of the discussion is about failure to satisfy Frequentist conditions that many may not care about (though they do have a section arguing we should care). But it is still a very good paper, showing that non-zero probability isn’t quite good enough for some continuous systems.
I have heard some argue for adjusting priors as a way of dealing with deductive discoveries since we aren’t logically omniscient. I think I like that solution. Realizing you forgot to carry a digit in a previous update isn’t exactly new information about the belief. Obviously a perfect Bayesian wouldn’t have this issue but I think we can feel free to evaluate priors given that we are so far away from that ideal.
But one man’s prior is another man’s posterior: I can use the belief that a medical test is 90% specific when using it to determine whether a patient has a disease, but I arrived at my beliefs about that medical test through Bayesian processes—either logical reasoning about the science behind the test, or more likely trying the test on a bunch of people and using statistics to estimate a specificity.
So it may be mathematically wrong to tell me my 90% prior is false, but the 90% prior from the first question is the same 90% posterior from the second question, and it’s totally kosher to say that the 90% posterior from the second question is wrong (and by extension, I’m using the “wrong prior”)
The whole reflective consistency thing is that you shouldn’t have “foundational priors” in the sense that they’re not the posterior of anything. Every foundational prior gets checked by how well it accords with other things, and in that sense is sort of a posterior.
So I agree with cousin_it that it would be a problem if every Bayesian believed their prior to be correct (as in—they got the correct posterior yesterday to use as their prior today).
Vladimir is using “prior” to mean a map from streams of observations to probability distributions over streams of future observation, not the prior probability before updating. Follow the link in his comment.
Prior can’t be judged. It’s not assumed to be “correct”. It’s just the way you happen to process new info and make decisions, and there is no procedure to change the way it is from inside the system.
Locked in, huh? Then I don’t want to be a Bayesian.
If someone was locked in to a belief, then they’d use a point mass prior. All other priors express some uncertainty.
Since you are already locked in in some preference anyway, you should figure out how to compute within it best (build a FAI).
What makes you say that? It’s not true. My preferences have changed many times.
Distinguish formal preference and likes. Formal preference is like prior: both current beliefs and procedure for updating the beliefs; beliefs change, but not the procedure. Likes are like beliefs: they change all the time, according to formal preference, in response to observations and reflection. Of course, we might consider jumping to a meta level, where the procedure for updating beliefs is itself subject to revision; this doesn’t really change the game, you’ve just named some of the beliefs changing according to fixed prior “object-level priors”, and named the process of revising those beliefs according to the fixed prior “process of changing object-level prior”.
When formal preference changes, it by definition means that it changed not according to (former) formal preference, that is something undesirable happened. Humans are not able to hold their preference fixed, which means that their preferences do change, what I call “value drift”.
You are locked in in some preference in normative sense, not factual. This means that value drift does change your preference, but it is actually desirable (for you) for your formal preference to never change.
I object to your talking about “formal preference” without having a formal definition. Until you invent one, please let’s talk about what normal humans mean by “preference” instead.
I’m trying to find a formal understanding of a certain concept, and this concept is not what is normally called “preference”, as in “likes”. To distinguish from the word “preference”, I used the label “formal preference” in the above comment to refer to this concept I don’t fully understand. Maybe the adjective “formal” is inappropriate for something I can’t formally define, but it’s not an option to talk about a different concept, as I’m not interested in a different concept. Hence I’m confused about what you are really suggesting by
For the purposes of FAI, what I’m discussing as “formal preference”, which is the same as “morality”, is clearly more important than likes.
I’d be willing to bet money that any formalization of “preference” that you invent, short of encoding the whole world into it, will still describe a property that some humans do modify within themselves. So we aren’t locked in, but your AIs will be.
Do humans modify that property, or find it desirable to modify it? The distinction between factual and normative is very important here, since we are talking about preference, the pure normative. If humans prefer different preference from a given one, they do so in some lawful way, according to some preference criterion (that they hold in their minds). All such meta-steps should be included. (Of course, it might prove impossible to formalize in practice.)
As for the “encoding the whole world” part, it’s the ontology problem, and I’m pretty sure that it’s enough to encode preference about strategy (external behavior, given all possible observations) of a given concrete agent, to preserve all of human preference. Preference about external world or the way the agent works on the inside is not required.
What makes you say that Bayesians are locked in? It’s not true. If they’re presented with evidence for or against their beliefs, they’ll change them.
You’re talking about posteriors. They’re talking about priors, presumably foundational priors that for some reason aren’t posteriors for any computations. An important question is whether such priors exist.
But your beliefs are your posteriors, not your priors. If the only thing that’s locked in is your priors, that’s not a locking-in at all.
That’s not obvious. You’d need to study many specific cases, and see if starting from different priors reliably predicts the final posteriors. There might be no way to “get there from here” for some priors.
When we speak of the values that an organism has, which are analogous to the priors an organism starts with, it’s routine to speak of the role of the initial values as locking in a value system. Why do we treat these cases differently?
That’s obviously true for priors that initially assign probability zero somewhere. But as Cosma Shalizi loves pointing out, Diaconis and Freedman have shown it can happen for more reasonable priors too, where the prior is “maladapted to the data generating process”.
This is of course one of those questionable cases with a lot of infinities being thrown around, and we know that applying Bayesian reasoning with infinities is not on fully solid footing. And much of the discussion is about failure to satisfy Frequentist conditions that many may not care about (though they do have a section arguing we should care). But it is still a very good paper, showing that non-zero probability isn’t quite good enough for some continuous systems.
I have heard some argue for adjusting priors as a way of dealing with deductive discoveries since we aren’t logically omniscient. I think I like that solution. Realizing you forgot to carry a digit in a previous update isn’t exactly new information about the belief. Obviously a perfect Bayesian wouldn’t have this issue but I think we can feel free to evaluate priors given that we are so far away from that ideal.
But one man’s prior is another man’s posterior: I can use the belief that a medical test is 90% specific when using it to determine whether a patient has a disease, but I arrived at my beliefs about that medical test through Bayesian processes—either logical reasoning about the science behind the test, or more likely trying the test on a bunch of people and using statistics to estimate a specificity.
So it may be mathematically wrong to tell me my 90% prior is false, but the 90% prior from the first question is the same 90% posterior from the second question, and it’s totally kosher to say that the 90% posterior from the second question is wrong (and by extension, I’m using the “wrong prior”)
The whole reflective consistency thing is that you shouldn’t have “foundational priors” in the sense that they’re not the posterior of anything. Every foundational prior gets checked by how well it accords with other things, and in that sense is sort of a posterior.
So I agree with cousin_it that it would be a problem if every Bayesian believed their prior to be correct (as in—they got the correct posterior yesterday to use as their prior today).
Vladimir is using “prior” to mean a map from streams of observations to probability distributions over streams of future observation, not the prior probability before updating. Follow the link in his comment.