General question: I’ve read somewhere that there’s a Bayesian approach to at least partially justifying simplicity arguments / Occam’s Razor. Where can I find a good accessible explanation of this?
Specifically: Say you’re presented with a body of evidence and you come up with two sets of explanations for that evidence. Explanation Set A consists of one or two elegant principles that explain the entire body of evidence nicely. Explanation Set B consists of hundreds of separate explanations, each one of which only explains a small part of the evidence. Assuming your priors for each individual explanation is about equal, is there a Bayesian explanation for our intuition that we should bet on Explanation Set A?
What about if your prior for each individual explanation in Set B is higher than the priors for the explanations in Set A?
Example:
Say you’re discussing Bible Criticism with a religious friend who believes in the traditional notion of complete Mosaic authorship but who is at least somewhat open to alternatives. To your friend, the priors for Mosaic authorship are much higher than the priors for a documentary or fragmentary hypothesis. (If you want numbers, say that your friend’s priors are .95 in favor of Mosaic authorship.)
Now you present the arguments, many of which (if I understand them correctly) boil down to simplicity arguments:
Mosaic authorship requires either a huge number of tortured explanations for individual verses, or it requires saying “we don’t know” or “God kept it secret for some reason”. Documentary-type hypotheses, on the other hand, postulate a few basic principles and use them to explain virtually everything.
Several different lines of local internal evidence often point to exactly the same conclusions. For example, an analysis of the repetitions within a story might lead us to divide up the verses between authors in a certain way, while at the same time an independent stylistic analysis leads us to virtually the same thing. So we again have a single explanation set that resolves multiple sets of difficulties, which again is simpler / more elegant than the alternative of proposing numerous individual explanations to resolve each difficulty, or just throwing up our hands and saying God keeps lots of secrets.
The question is, is your friend justified in rejecting your simplicity-based arguments based on his high priors? What about if his priors were lower, say .6 in favor of Mosaic authorship? What about if he held 50-50 priors?
The B approach to Occam’s razor is just a way to think carefully about your possible preference for simplicity. If you prefer simpler explanations, you can bias your prior appropriately, and then the B machinery will handle how you should change your mind with more evidence (which might possibly favor more complex explanations, since Nature isn’t obligated to follow your preferences).
I don’t think it’s a good idea to use B in settings other than statistical inference, or probability puzzles. Arguing with people is an exercise in xenoanthropology, not an exercise in B.
I don’t think it’s a good idea to use B in settings other than statistical inference, or probability puzzles.
I’m not sure exactly what you mean by this. Do you mean that Bayesianism is inappropriate for situations where the data points are arguments and explanations rather than quantifiable measurements or the like? Do you mean that it shouldn’t be used to prefer one person’s argument over another’s?
In any case, could you elaborate on this point? I haven’t read through much of the Sequences yet (I’m waiting for the book version to come out), but my impression was that using Bayesian-type approaches outside of purely statistical situations is a large part of what they are about.
Arguing with people is an exercise in xenoanthropology, not an exercise in B.
Not sure I understand this. Assuming you’re both trying to approach the truth, arguing with others is a chance to get additional evidence you might not have noticed before. That’s both xenoanthropology and Bayesianism.
my impression was that using Bayesian-type approaches outside of purely statistical situations is a large part of
what they are about.
Yes. I disagree.
Do you mean that it shouldn’t be used to prefer one person’s argument over another’s?
Look at our good friend Scott Alexander dissecting arguments. How much actual B does he use? Usually just pointing out basic innumeracy is enough “oh you are off by a few orders of magnitude” (but that’s not B, that’s just being numerate, e.g. being able to add numbers, etc.)
Assuming you’re both trying to approach the truth...
I think the kind of stuff folks in this community use to argue/update internally is all fine, but I don’t think it’s a formal B setup usually, just some hacks along the lines of “X has shown herself to be thoughtful and sensible in the past, and disagrees w/ me about Y, I should adjust my own beliefs.”
This will not work with outsiders, since they generally play a different game than you. I think the dominating term in arguments is understanding social context in which the other side is operating, and learning how they use words. If B comes up at all, it’s just easy bookkeeping on top of that hard stuff.
I don’t understand what people here mean by “B.” For example, using Bayes theorem isn’t “B” because everyone who believes the chain rule of probabilities uses Bayes theorem (so hopefully everyone).
Seems they’re referring to Bayesian Epistemology / Bayesian Confirmation Theory, along with informal variants thereof. Bayesian Epistemology is a very well respected and popular movement in philosophy, although it is by no means universally accepted. In any case, the use of the term “Bayesian” in this sense is certainly not limited to LessWrong.
Assuming your priors for each individual explanation is about equal, is there a Bayesian explanation for our intuition that we should bet on Explanation Set A?
Do you mean your prior for A is about your prior for B, or your priors for each element are about the same?
If you mean the first, then there is no reason to favor one over the other. Occam’s razor just says the more complex explanation has a lower prior.
If you mean the second, then there is a very good reason to favor A. If A has n explanations, B has m, all explanations are independant and of probability p, then P(A) = p^n and P(B) = p^m. A is exponentially more likely than B. In real life, assuming independence tends to be a bad idea, so it won’t be quite so extreme, but the simpler explanation is still favored.
I think you’ll get somewhere by searching for the phrase “complexity penalty.” The idea is that we have a prior probability for any explanation that depends on how many terms / free parameters are in the explanation. For your particular example, I think you need to argue that their prior probability should be different than it is.
I think it’s easier to give a ‘frequentist’ explanation of why this makes sense, though, by looking at overfitting. If you look at the uncertainty in the parameter estimates, they roughly depend on the number of sample points per parameter. Thus the fewer parameters in a model, the more we think each of those parameters will generalize. One way to think about this is the more free parameters you have in a model, the more explanatory power you get “for free,” and so we need to penalize the model to account for that. Consider the Akaike information criterion and Bayesian information criterion.
General question: I’ve read somewhere that there’s a Bayesian approach to at least partially justifying simplicity arguments / Occam’s Razor. Where can I find a good accessible explanation of this?
This is a good question, but not when applied to the origin of the Torah example. There a more appropriate discussion is of the motivated cognition of the original Talmudic authors, who would have happily attributed 100% of the Torah to the same source, were it not for the 8 verses which do not fit. For a Christian these authors are already suspect because they denied the first coming of the Messiah, so one’s priors of their trustworthiness should be low to begin with.
General question: I’ve read somewhere that there’s a Bayesian approach to at least partially justifying simplicity arguments / Occam’s Razor. Where can I find a good accessible explanation of this?
Specifically: Say you’re presented with a body of evidence and you come up with two sets of explanations for that evidence. Explanation Set A consists of one or two elegant principles that explain the entire body of evidence nicely. Explanation Set B consists of hundreds of separate explanations, each one of which only explains a small part of the evidence. Assuming your priors for each individual explanation is about equal, is there a Bayesian explanation for our intuition that we should bet on Explanation Set A?
What about if your prior for each individual explanation in Set B is higher than the priors for the explanations in Set A?
Example:
Say you’re discussing Bible Criticism with a religious friend who believes in the traditional notion of complete Mosaic authorship but who is at least somewhat open to alternatives. To your friend, the priors for Mosaic authorship are much higher than the priors for a documentary or fragmentary hypothesis. (If you want numbers, say that your friend’s priors are .95 in favor of Mosaic authorship.)
Now you present the arguments, many of which (if I understand them correctly) boil down to simplicity arguments:
Mosaic authorship requires either a huge number of tortured explanations for individual verses, or it requires saying “we don’t know” or “God kept it secret for some reason”. Documentary-type hypotheses, on the other hand, postulate a few basic principles and use them to explain virtually everything.
Several different lines of local internal evidence often point to exactly the same conclusions. For example, an analysis of the repetitions within a story might lead us to divide up the verses between authors in a certain way, while at the same time an independent stylistic analysis leads us to virtually the same thing. So we again have a single explanation set that resolves multiple sets of difficulties, which again is simpler / more elegant than the alternative of proposing numerous individual explanations to resolve each difficulty, or just throwing up our hands and saying God keeps lots of secrets.
The question is, is your friend justified in rejecting your simplicity-based arguments based on his high priors? What about if his priors were lower, say .6 in favor of Mosaic authorship? What about if he held 50-50 priors?
The B approach to Occam’s razor is just a way to think carefully about your possible preference for simplicity. If you prefer simpler explanations, you can bias your prior appropriately, and then the B machinery will handle how you should change your mind with more evidence (which might possibly favor more complex explanations, since Nature isn’t obligated to follow your preferences).
I don’t think it’s a good idea to use B in settings other than statistical inference, or probability puzzles. Arguing with people is an exercise in xenoanthropology, not an exercise in B.
Upvoted for
I’m not sure exactly what you mean by this. Do you mean that Bayesianism is inappropriate for situations where the data points are arguments and explanations rather than quantifiable measurements or the like? Do you mean that it shouldn’t be used to prefer one person’s argument over another’s?
In any case, could you elaborate on this point? I haven’t read through much of the Sequences yet (I’m waiting for the book version to come out), but my impression was that using Bayesian-type approaches outside of purely statistical situations is a large part of what they are about.
Not sure I understand this. Assuming you’re both trying to approach the truth, arguing with others is a chance to get additional evidence you might not have noticed before. That’s both xenoanthropology and Bayesianism.
Yes. I disagree.
Look at our good friend Scott Alexander dissecting arguments. How much actual B does he use? Usually just pointing out basic innumeracy is enough “oh you are off by a few orders of magnitude” (but that’s not B, that’s just being numerate, e.g. being able to add numbers, etc.)
I think the kind of stuff folks in this community use to argue/update internally is all fine, but I don’t think it’s a formal B setup usually, just some hacks along the lines of “X has shown herself to be thoughtful and sensible in the past, and disagrees w/ me about Y, I should adjust my own beliefs.”
This will not work with outsiders, since they generally play a different game than you. I think the dominating term in arguments is understanding social context in which the other side is operating, and learning how they use words. If B comes up at all, it’s just easy bookkeeping on top of that hard stuff.
I don’t understand what people here mean by “B.” For example, using Bayes theorem isn’t “B” because everyone who believes the chain rule of probabilities uses Bayes theorem (so hopefully everyone).
Seems they’re referring to Bayesian Epistemology / Bayesian Confirmation Theory, along with informal variants thereof. Bayesian Epistemology is a very well respected and popular movement in philosophy, although it is by no means universally accepted. In any case, the use of the term “Bayesian” in this sense is certainly not limited to LessWrong.
Do you mean your prior for A is about your prior for B, or your priors for each element are about the same?
If you mean the first, then there is no reason to favor one over the other. Occam’s razor just says the more complex explanation has a lower prior.
If you mean the second, then there is a very good reason to favor A. If A has n explanations, B has m, all explanations are independant and of probability p, then P(A) = p^n and P(B) = p^m. A is exponentially more likely than B. In real life, assuming independence tends to be a bad idea, so it won’t be quite so extreme, but the simpler explanation is still favored.
I think you’ll get somewhere by searching for the phrase “complexity penalty.” The idea is that we have a prior probability for any explanation that depends on how many terms / free parameters are in the explanation. For your particular example, I think you need to argue that their prior probability should be different than it is.
I think it’s easier to give a ‘frequentist’ explanation of why this makes sense, though, by looking at overfitting. If you look at the uncertainty in the parameter estimates, they roughly depend on the number of sample points per parameter. Thus the fewer parameters in a model, the more we think each of those parameters will generalize. One way to think about this is the more free parameters you have in a model, the more explanatory power you get “for free,” and so we need to penalize the model to account for that. Consider the Akaike information criterion and Bayesian information criterion.
This is a good question, but not when applied to the origin of the Torah example. There a more appropriate discussion is of the motivated cognition of the original Talmudic authors, who would have happily attributed 100% of the Torah to the same source, were it not for the 8 verses which do not fit. For a Christian these authors are already suspect because they denied the first coming of the Messiah, so one’s priors of their trustworthiness should be low to begin with.