“Everyone thinks they’ve won the Magical Belief Lottery. Everyone thinks they more or less have a handle on things, that they, as opposed to the billions who disagree with them, have somehow lucked into the one true belief system.”
Even if their explanation were correct, they would still have lucked into them. Others have different priors and no doubt different causes for their priors. So those Bayesians would have been lucky, in order to have the causes that would produce correct priors instead of incorrect ones.
But that still doesn’t need to be luck. I got my priors offa evolution and they are capable of noticing when something works or doesn’t work a hundred times in a row. True, if I had a different prior, I wouldn’t care about that either. But even so, that I have this prior is not a question of luck.
It is luck in a sense—every way that your opinion differs from someone else, you believe that factors outside of your control (your intelligence, your education, et cetera) have blessed you in such a way that your mind has done better than that poor person’s.
It’s just that it’s not a problem. Lottery winners got richer than everyone else by luck, but that doesn’t mean they’re deluded in believing that they’re rich. But someone who had only weak evidence ze won the lottery should be very skeptical. The real point of this quote is that being much less wrong than average is an improbable state, and you need correspondingly strong evidence to support the possibility. I think many of the people on this site probably do have some of that evidence (things like higher than average IQ scores would be decent signs of higher than normal probability of being right) but it’s still something worth worrying about.
I think I agree with that: There’s nothing necessarily delusive about believing you got lucky, but it should generally require (at least) an amount of evidence proportional to the amount of purported luck.
Then it would make sense to use some evolutionary thingy instead of Bayesianism as your basic theory of “correct behavior”, as Shalizi has half-jokingly suggested.
Not quite. They are the way you process claims about the world. A claim has to come in context of a method for its evaluation, but prior can only be evaluated by comparing it to itself...
This downvoting should be accompanied with discussion. I’ve answered the objections that were voiced, but naturally I can’t refute an incredulous stare.
The normal way of understanding priors is that they are or can be expressed as joint probability distributions, which can be more or less well-calibrated. You’re skipping over a lot of inferential steps.
Right. We could talk of quality of an approximation to a fixed object that is defined as the topic of a pursuit, even if we can’t choose the fixed object in the process and thus there is no sense in having preferences about its properties.
Say, you are trying to figure out what the mass on an electron is. As you develop your experimental techniques, there will be better or worse approximate answers along the way. It makes sense to characterize the approximations to the mass you seek to measure as more or less accurate, and characterize someone else’s wild guesses about this value as correct or not correct at all.
On the other hand, it doesn’t make sense so similarly characterize the actual mass of an electron. The actual mass of an electron can’t be correct or incorrect, can’t be more or less well-calibrated—talking this way would indicate a conceptual confusion.
When I talked about prior or preference in the above comments, I meant the actual facts, not particular approximations to those facts, the concepts that we might want to approximate, not approximations. Characterizing these facts as correct or incorrect doesn’t make sense for similar reasons.
Furthermore, since they are fixed elements of ideal decision-making algorithm, it doesn’t make sense to ascribe preference to them (more or less useful, more or less preferable). This is a bit more subtle than with the example of the mass of an electron, since in that case we had a factual estimation process, and with decision-making we also have a moral estimation process. With factual estimation, the fact that we are approximating isn’t itself an approximation, and so can’t be more or less accurate. With moral estimation, we are approximating the true value of a decision (event), and the actual value of a decision (event) can’t be too high or too low.
I follow you up until you conclude that priors cannot be correct or incorrect. An agent with more accurate priors will converge toward the actual answer more quickly—I’ll grant that’s not a binary distinction, but it’s a useful one.
If you are an agent with “less accurate prior”, then you won’t be able to recognize a “more accurate prior” as a better one. You are trying to look at the situation from the outside, but it’s not possible where we discuss your own decision-making algorithms.
If I’m blind, I won’t be able to recognize a sighted person by sight. That doesn’t change the fact that the sighted person can see better than the blind person.
There is no God’s view to define the truth, and Faith to attain it. You only get to use your own eyes. If I predict a fair coin will come up “heads”, and you predict it’ll come up “tails”, and it does come up “tails”, who was closer to the truth? The truth of such a prediction is not in how well it aligns with the outcome, but in how well it takes into account available information, how well it processes the state of uncertainty. What should be believed given the available information and what is actually true are two separate questions, and the latter question is never asked, as you never have all the information, only some state of uncertainty. Reality is not transparent, it’s not possible to glimpse the hidden truth, only to cope with uncertainty. Confuse the two at your own peril.
I’m so confused, I can’t even tell if we disagree. What I am thinking of is essentially the argument in Eliezer Yudkowsky’s “Inductive Bias”:
The more inductive bias you have, the faster you learn to predict the future, but only if your inductive bias does in fact concentrate more probability into sequences of observations that actually occur. If your inductive bias concentrates probability into sequences that don’t occur, this diverts probability mass from sequences that do occur, and you will learn more slowly, or not learn at all, or even—if you are unlucky enough—learn in the wrong direction.
Inductive biases can be probabilistically correct or probabilistically incorrect, and if they are correct, it is good to have as much of them as possible, and if they are incorrect, you are left worse off than if you had no inductive bias at all. Which is to say that inductive biases are like any other kind of belief; the true ones are good for you, the bad ones are worse than nothing. In contrast, statistical bias is always bad, period—you can trade it off against other ills, but it’s never a good thing for itself. Statistical bias is a systematic direction in errors; inductive bias is a systematic direction in belief revisions.
If you can inspect and analyze your own prior (using your own prior, of course) you can notice that your prior is not reflectively consistent, that you can come up with other priors that your prior expects to get better results. Humans, who are not ideal Bayesians but have a concept of ideal Bayesians, have actually done this.
(Though reflective consistency does not guarantee effectiveness. Some priors are too ineffective to notice they are ineffective.)
If you have a concept of prior (2), and wish to get better at acting according to it over time, then (2) is your real prior. It is what you (try to) use to make your decisions. (3) is just a tool you employ in the meantime, and you may pick a better tool, judging with (2). I don’t know what (1) means (or what (2) means when (1) is realized).
(1) is the prior I would have if I had never inspected and analyzed my prior. It is a path not taken from prior (3). The point of introducing it was to point out that I really believe (2) is better than (3), as opposed to (2) is better than (1) (which I also believe, but it isn’t the point).
Does “your prior” refer to (A) the prior you identify with, or (B) the prior that describes your actual beliefs as you process evidence, or something else?
If (A), I don’t understand:
This might be a process of figuring out what your prior is, but the approximations along the way are not your prior
If (B), I don’t understand:
If you have a concept of prior (2), and wish to get better at acting according to it over time, then (2) is your real prior.
According to what criterion? You’d end up comparing a prior to the prior you hold, with the “best” prior for you just being the same as yours. Like with preference. Clearly not the concept Unknowns was assuming—you don’t need luck to satisfy a tautology.
But you can go and get info, and then judge, and say, “That prior that I held was wrong.”
You’re speaking as if all truth were relative. I don’t know if you mean this, but your comments in this thread imply that there is no such thing as truth.
You’ve recently had other discussions about values and ethics, and the argument you’re making here parallels your position in that argument. You may be trying to keep your believes about values, and about truths in general, in syntactic conformance. But rationally I hope you agree they’re different.
And, of course the priors must be updated the correct way.
Nonetheless, it is greatly preferable to have a prior that led to decisions that gave high utility, rather than one that led to decisions that gave low utility. Of course this can’t be measured “before hand”. But the whole point of updating is to get better priors, in this exact sense, for the next round of decisions and updates.
Distinguish formal preference and likes. Formal preference is like prior: both current beliefs and procedure for updating the beliefs; beliefs change, but not the procedure. Likes are like beliefs: they change all the time, according to formal preference, in response to observations and reflection. Of course, we might consider jumping to a meta level, where the procedure for updating beliefs is itself subject to revision; this doesn’t really change the game, you’ve just named some of the beliefs changing according to fixed prior “object-level priors”, and named the process of revising those beliefs according to the fixed prior “process of changing object-level prior”.
When formal preference changes, it by definition means that it changed not according to (former) formal preference, that is something undesirable happened. Humans are not able to hold their preference fixed, which means that their preferences do change, what I call “value drift”.
You are locked in in some preference in normative sense, not factual. This means that value drift does change your preference, but it is actually desirable (for you) for your formal preference to never change.
Formal preference is like prior: both current beliefs and procedure for updating the beliefs; beliefs change, but not the procedure.
I object to your talking about “formal preference” without having a formal definition. Until you invent one, please let’s talk about what normal humans mean by “preference” instead.
I’m trying to find a formal understanding of a certain concept, and this concept is not what is normally called “preference”, as in “likes”. To distinguish from the word “preference”, I used the label “formal preference” in the above comment to refer to this concept I don’t fully understand. Maybe the adjective “formal” is inappropriate for something I can’t formally define, but it’s not an option to talk about a different concept, as I’m not interested in a different concept. Hence I’m confused about what you are really suggesting by
Until you invent one, please let’s talk about what normal humans mean by “preference” instead.
For the purposes of FAI, what I’m discussing as “formal preference”, which is the same as “morality”, is clearly more important than likes.
I’d be willing to bet money that any formalization of “preference” that you invent, short of encoding the whole world into it, will still describe a property that some humans do modify within themselves. So we aren’t locked in, but your AIs will be.
Do humans modify that property, or find it desirable to modify it? The distinction between factual and normative is very important here, since we are talking about preference, the pure normative. If humans prefer different preference from a given one, they do so in some lawful way, according to some preference criterion (that they hold in their minds). All such meta-steps should be included. (Of course, it might prove impossible to formalize in practice.)
As for the “encoding the whole world” part, it’s the ontology problem, and I’m pretty sure that it’s enough to encode preference about strategy (external behavior, given all possible observations) of a given concrete agent, to preserve all of human preference. Preference about external world or the way the agent works on the inside is not required.
You’re talking about posteriors. They’re talking about priors, presumably foundational priors that for some reason aren’t posteriors for any computations. An important question is whether such priors exist.
That’s not obvious. You’d need to study many specific cases, and see if starting from different priors reliably predicts the final posteriors. There might be no way to “get there from here” for some priors.
When we speak of the values that an organism has, which are analogous to the priors an organism starts with, it’s routine to speak of the role of the initial values as locking in a value system. Why do we treat these cases differently?
There might be no way to “get there from here” for some priors.
That’s obviously true for priors that initially assign probability zero somewhere. But as Cosma Shalizi loves pointingout, Diaconis and Freedmanhave shown it can happen for more reasonable priors too, where the prior is “maladapted to the data generating process”.
This is of course one of those questionable cases with a lot of infinities being thrown around, and we know that applying Bayesian reasoning with infinities is not on fully solid footing. And much of the discussion is about failure to satisfy Frequentist conditions that many may not care about (though they do have a section arguing we should care). But it is still a very good paper, showing that non-zero probability isn’t quite good enough for some continuous systems.
I have heard some argue for adjusting priors as a way of dealing with deductive discoveries since we aren’t logically omniscient. I think I like that solution. Realizing you forgot to carry a digit in a previous update isn’t exactly new information about the belief. Obviously a perfect Bayesian wouldn’t have this issue but I think we can feel free to evaluate priors given that we are so far away from that ideal.
But one man’s prior is another man’s posterior: I can use the belief that a medical test is 90% specific when using it to determine whether a patient has a disease, but I arrived at my beliefs about that medical test through Bayesian processes—either logical reasoning about the science behind the test, or more likely trying the test on a bunch of people and using statistics to estimate a specificity.
So it may be mathematically wrong to tell me my 90% prior is false, but the 90% prior from the first question is the same 90% posterior from the second question, and it’s totally kosher to say that the 90% posterior from the second question is wrong (and by extension, I’m using the “wrong prior”)
The whole reflective consistency thing is that you shouldn’t have “foundational priors” in the sense that they’re not the posterior of anything. Every foundational prior gets checked by how well it accords with other things, and in that sense is sort of a posterior.
So I agree with cousin_it that it would be a problem if every Bayesian believed their prior to be correct (as in—they got the correct posterior yesterday to use as their prior today).
Vladimir is using “prior” to mean a map from streams of observations to probability distributions over streams of future observation, not the prior probability before updating. Follow the link in his comment.
-- R Scott Bakker, Neuropath
You mean, like every Bayesian believes their prior is correct?
Bayesians don’t believe they lucked into their priors. They have a reflectively consistent causal explanation for their priors.
Even if their explanation were correct, they would still have lucked into them. Others have different priors and no doubt different causes for their priors. So those Bayesians would have been lucky, in order to have the causes that would produce correct priors instead of incorrect ones.
But that still doesn’t need to be luck. I got my priors offa evolution and they are capable of noticing when something works or doesn’t work a hundred times in a row. True, if I had a different prior, I wouldn’t care about that either. But even so, that I have this prior is not a question of luck.
It is luck in a sense—every way that your opinion differs from someone else, you believe that factors outside of your control (your intelligence, your education, et cetera) have blessed you in such a way that your mind has done better than that poor person’s.
It’s just that it’s not a problem. Lottery winners got richer than everyone else by luck, but that doesn’t mean they’re deluded in believing that they’re rich. But someone who had only weak evidence ze won the lottery should be very skeptical. The real point of this quote is that being much less wrong than average is an improbable state, and you need correspondingly strong evidence to support the possibility. I think many of the people on this site probably do have some of that evidence (things like higher than average IQ scores would be decent signs of higher than normal probability of being right) but it’s still something worth worrying about.
I think I agree with that: There’s nothing necessarily delusive about believing you got lucky, but it should generally require (at least) an amount of evidence proportional to the amount of purported luck.
Then it would make sense to use some evolutionary thingy instead of Bayesianism as your basic theory of “correct behavior”, as Shalizi has half-jokingly suggested.
Priors can’t be correct or incorrect.
(Clarified in detail in this comment.)
Sounds mysterious to me. Priors are not claims about the world?
Not quite. They are the way you process claims about the world. A claim has to come in context of a method for its evaluation, but prior can only be evaluated by comparing it to itself...
This downvoting should be accompanied with discussion. I’ve answered the objections that were voiced, but naturally I can’t refute an incredulous stare.
The normal way of understanding priors is that they are or can be expressed as joint probability distributions, which can be more or less well-calibrated. You’re skipping over a lot of inferential steps.
Right. We could talk of quality of an approximation to a fixed object that is defined as the topic of a pursuit, even if we can’t choose the fixed object in the process and thus there is no sense in having preferences about its properties.
I can’t tell what you’re talking about.
Say, you are trying to figure out what the mass on an electron is. As you develop your experimental techniques, there will be better or worse approximate answers along the way. It makes sense to characterize the approximations to the mass you seek to measure as more or less accurate, and characterize someone else’s wild guesses about this value as correct or not correct at all.
On the other hand, it doesn’t make sense so similarly characterize the actual mass of an electron. The actual mass of an electron can’t be correct or incorrect, can’t be more or less well-calibrated—talking this way would indicate a conceptual confusion.
When I talked about prior or preference in the above comments, I meant the actual facts, not particular approximations to those facts, the concepts that we might want to approximate, not approximations. Characterizing these facts as correct or incorrect doesn’t make sense for similar reasons.
Furthermore, since they are fixed elements of ideal decision-making algorithm, it doesn’t make sense to ascribe preference to them (more or less useful, more or less preferable). This is a bit more subtle than with the example of the mass of an electron, since in that case we had a factual estimation process, and with decision-making we also have a moral estimation process. With factual estimation, the fact that we are approximating isn’t itself an approximation, and so can’t be more or less accurate. With moral estimation, we are approximating the true value of a decision (event), and the actual value of a decision (event) can’t be too high or too low.
I follow you up until you conclude that priors cannot be correct or incorrect. An agent with more accurate priors will converge toward the actual answer more quickly—I’ll grant that’s not a binary distinction, but it’s a useful one.
If you are an agent with “less accurate prior”, then you won’t be able to recognize a “more accurate prior” as a better one. You are trying to look at the situation from the outside, but it’s not possible where we discuss your own decision-making algorithms.
If I’m blind, I won’t be able to recognize a sighted person by sight. That doesn’t change the fact that the sighted person can see better than the blind person.
There is no God’s view to define the truth, and Faith to attain it. You only get to use your own eyes. If I predict a fair coin will come up “heads”, and you predict it’ll come up “tails”, and it does come up “tails”, who was closer to the truth? The truth of such a prediction is not in how well it aligns with the outcome, but in how well it takes into account available information, how well it processes the state of uncertainty. What should be believed given the available information and what is actually true are two separate questions, and the latter question is never asked, as you never have all the information, only some state of uncertainty. Reality is not transparent, it’s not possible to glimpse the hidden truth, only to cope with uncertainty. Confuse the two at your own peril.
I’m so confused, I can’t even tell if we disagree. What I am thinking of is essentially the argument in Eliezer Yudkowsky’s “Inductive Bias”:
If you can inspect and analyze your own prior (using your own prior, of course) you can notice that your prior is not reflectively consistent, that you can come up with other priors that your prior expects to get better results. Humans, who are not ideal Bayesians but have a concept of ideal Bayesians, have actually done this.
(Though reflective consistency does not guarantee effectiveness. Some priors are too ineffective to notice they are ineffective.)
This might be a process of figuring out what your prior is, but the approximations along the way are not your prior (they might be some priors).
I see three priors to track here:
The prior I would counterfactually have had if I were not able to make this comparison.
The ideal prior I am comparing my approximation of prior (1) to.
My actual prior resulting from this comparison, reflecting that I try to implement prior (2), but cannot always compute/internalize it.
I have prior (3), but I believe prior (2) is better.
If you have a concept of prior (2), and wish to get better at acting according to it over time, then (2) is your real prior. It is what you (try to) use to make your decisions. (3) is just a tool you employ in the meantime, and you may pick a better tool, judging with (2). I don’t know what (1) means (or what (2) means when (1) is realized).
(1) is the prior I would have if I had never inspected and analyzed my prior. It is a path not taken from prior (3). The point of introducing it was to point out that I really believe (2) is better than (3), as opposed to (2) is better than (1) (which I also believe, but it isn’t the point).
Does “your prior” refer to (A) the prior you identify with, or (B) the prior that describes your actual beliefs as you process evidence, or something else?
If (A), I don’t understand:
If (B), I don’t understand:
They can be more or less useful, though.
According to what criterion? You’d end up comparing a prior to the prior you hold, with the “best” prior for you just being the same as yours. Like with preference. Clearly not the concept Unknowns was assuming—you don’t need luck to satisfy a tautology.
Correspondence to reality.
(Do you realize how inferentially far the idea of prior as part of preference is from the normal worldview here?)
Of being better at predicting what happens, of course.
You can’t judge based on info you don’t have. Based on what you do have, you can do no better than current prior.
But you can go and get info, and then judge, and say, “That prior that I held was wrong.”
You’re speaking as if all truth were relative. I don’t know if you mean this, but your comments in this thread imply that there is no such thing as truth.
You’ve recently had other discussions about values and ethics, and the argument you’re making here parallels your position in that argument. You may be trying to keep your believes about values, and about truths in general, in syntactic conformance. But rationally I hope you agree they’re different.
It is only wrong not to update.
And, of course the priors must be updated the correct way.
Nonetheless, it is greatly preferable to have a prior that led to decisions that gave high utility, rather than one that led to decisions that gave low utility. Of course this can’t be measured “before hand”. But the whole point of updating is to get better priors, in this exact sense, for the next round of decisions and updates.
I am in violent agreement.
Prior can’t be judged. It’s not assumed to be “correct”. It’s just the way you happen to process new info and make decisions, and there is no procedure to change the way it is from inside the system.
Locked in, huh? Then I don’t want to be a Bayesian.
If someone was locked in to a belief, then they’d use a point mass prior. All other priors express some uncertainty.
Since you are already locked in in some preference anyway, you should figure out how to compute within it best (build a FAI).
What makes you say that? It’s not true. My preferences have changed many times.
Distinguish formal preference and likes. Formal preference is like prior: both current beliefs and procedure for updating the beliefs; beliefs change, but not the procedure. Likes are like beliefs: they change all the time, according to formal preference, in response to observations and reflection. Of course, we might consider jumping to a meta level, where the procedure for updating beliefs is itself subject to revision; this doesn’t really change the game, you’ve just named some of the beliefs changing according to fixed prior “object-level priors”, and named the process of revising those beliefs according to the fixed prior “process of changing object-level prior”.
When formal preference changes, it by definition means that it changed not according to (former) formal preference, that is something undesirable happened. Humans are not able to hold their preference fixed, which means that their preferences do change, what I call “value drift”.
You are locked in in some preference in normative sense, not factual. This means that value drift does change your preference, but it is actually desirable (for you) for your formal preference to never change.
I object to your talking about “formal preference” without having a formal definition. Until you invent one, please let’s talk about what normal humans mean by “preference” instead.
I’m trying to find a formal understanding of a certain concept, and this concept is not what is normally called “preference”, as in “likes”. To distinguish from the word “preference”, I used the label “formal preference” in the above comment to refer to this concept I don’t fully understand. Maybe the adjective “formal” is inappropriate for something I can’t formally define, but it’s not an option to talk about a different concept, as I’m not interested in a different concept. Hence I’m confused about what you are really suggesting by
For the purposes of FAI, what I’m discussing as “formal preference”, which is the same as “morality”, is clearly more important than likes.
I’d be willing to bet money that any formalization of “preference” that you invent, short of encoding the whole world into it, will still describe a property that some humans do modify within themselves. So we aren’t locked in, but your AIs will be.
Do humans modify that property, or find it desirable to modify it? The distinction between factual and normative is very important here, since we are talking about preference, the pure normative. If humans prefer different preference from a given one, they do so in some lawful way, according to some preference criterion (that they hold in their minds). All such meta-steps should be included. (Of course, it might prove impossible to formalize in practice.)
As for the “encoding the whole world” part, it’s the ontology problem, and I’m pretty sure that it’s enough to encode preference about strategy (external behavior, given all possible observations) of a given concrete agent, to preserve all of human preference. Preference about external world or the way the agent works on the inside is not required.
What makes you say that Bayesians are locked in? It’s not true. If they’re presented with evidence for or against their beliefs, they’ll change them.
You’re talking about posteriors. They’re talking about priors, presumably foundational priors that for some reason aren’t posteriors for any computations. An important question is whether such priors exist.
But your beliefs are your posteriors, not your priors. If the only thing that’s locked in is your priors, that’s not a locking-in at all.
That’s not obvious. You’d need to study many specific cases, and see if starting from different priors reliably predicts the final posteriors. There might be no way to “get there from here” for some priors.
When we speak of the values that an organism has, which are analogous to the priors an organism starts with, it’s routine to speak of the role of the initial values as locking in a value system. Why do we treat these cases differently?
That’s obviously true for priors that initially assign probability zero somewhere. But as Cosma Shalizi loves pointing out, Diaconis and Freedman have shown it can happen for more reasonable priors too, where the prior is “maladapted to the data generating process”.
This is of course one of those questionable cases with a lot of infinities being thrown around, and we know that applying Bayesian reasoning with infinities is not on fully solid footing. And much of the discussion is about failure to satisfy Frequentist conditions that many may not care about (though they do have a section arguing we should care). But it is still a very good paper, showing that non-zero probability isn’t quite good enough for some continuous systems.
I have heard some argue for adjusting priors as a way of dealing with deductive discoveries since we aren’t logically omniscient. I think I like that solution. Realizing you forgot to carry a digit in a previous update isn’t exactly new information about the belief. Obviously a perfect Bayesian wouldn’t have this issue but I think we can feel free to evaluate priors given that we are so far away from that ideal.
But one man’s prior is another man’s posterior: I can use the belief that a medical test is 90% specific when using it to determine whether a patient has a disease, but I arrived at my beliefs about that medical test through Bayesian processes—either logical reasoning about the science behind the test, or more likely trying the test on a bunch of people and using statistics to estimate a specificity.
So it may be mathematically wrong to tell me my 90% prior is false, but the 90% prior from the first question is the same 90% posterior from the second question, and it’s totally kosher to say that the 90% posterior from the second question is wrong (and by extension, I’m using the “wrong prior”)
The whole reflective consistency thing is that you shouldn’t have “foundational priors” in the sense that they’re not the posterior of anything. Every foundational prior gets checked by how well it accords with other things, and in that sense is sort of a posterior.
So I agree with cousin_it that it would be a problem if every Bayesian believed their prior to be correct (as in—they got the correct posterior yesterday to use as their prior today).
Vladimir is using “prior” to mean a map from streams of observations to probability distributions over streams of future observation, not the prior probability before updating. Follow the link in his comment.