Non-Bayesianism for Bayesians (based on a poor understanding of Andrew Gelman and Cosma Shalizi)
Lakatos (and Kuhn) are philosophers of science who studied science as scientists actually do it, as opposed to how scientists (at the time) claimed scientists do it. This is in contrast to taking the “scientific method” that we learned in grade school literally. Theories are not rejected at the first evidence that they have failed, they are patched, and so on.
Gelman and Shalizi’s criticism of Bayesian rhetoric (as far as I can make out from their blog posts and the slides
of Gelman’s talk) is (explicitly) similar—what Bayesians do is different than what Bayesians say Bayesians do.
In particular, humans (as opposed to ideal, which is to say nonexistent, Bayesians) do not SIMPLY update on the evidence. There are other important steps in the process, such as checking whether, given the new data, your original model still looks reasonable. (This is “posterior predictive model checking”). This step looks a lot like computing a p-value, though Gelman recommends a graphical presentation, rather than condensing to a single number.
In general, the notion of doing research on which priors are decent ones for scientific practice—strong enough to capture knowledge that we really do have, and weak enough to adapt to the evidence, given sufficient evidence—is a non-Bayesian notion; a perfect Bayesian only chooses their prior once, and never changes it. Note that historically, Jaynes worked on heuristics for how to choose a good prior, making him a non-Bayesian.
I saw an example that impressed me (and I can’t find the paper now to cite it!).
Suppose you have an urn A, with many balls in it, labeled A, and one ball labeled Z. Also, an urn B, with many (but fewer) balls in it labeled B and one ball labeled C, et cetera, until you finally have an urn Z with the fewest balls in it, labeled Z.
If we mix the urns and draw a ball from the mixture, which urn did it probably originally come from?
Suppose (because you’re a computationally-limited Bayesian) that you only include in your model the N highest-probability hypotheses. That is, you include A, B, C, in your model, but you neglect Z—that is, you put zero probability on it. (We can make Z’s pre-evidence probability arbitrarily small, to make this seem reasonable at the time.) When one, or even N balls turn out to be labeled Z, the model (due to the initial zero probability on Z) continues insisting that the balls came from one of the initially-specified hypotheses.
Of course, you could (and should) do a posterior predictive check, computing the probability that your model assigns to the observed data, and revise your model if the probability says your model is wack. However, that step “looks frequentist”, and isn’t explicitly included the rhetoric of “Bayesian Statistics = Science”. Bayesians update on the evidence, they don’t revise their models!
Anyway, don’t get caught up in factionalism and tribal us vs. them thinking!
Suppose (because you’re a computationally-limited Bayesian) that you only include in your model the N highest-probability hypotheses. That is, you include A, B, C, in your model, but you neglect Z—that is, you put zero probability on it. (We can make Z’s pre-evidence probability arbitrarily small, to make this seem reasonable at the time.) When one, or even N balls turn out to be labeled Z, the model (due to the initial zero probability on Z) continues insisting that the balls came from one of the initially-specified hypotheses.
That isn’t just a computational limitation. It’s an outright bug. Something that assigns 0 to Z is just not even an approximation of a Bayesian. A sane agent with limited resources may, for example, assign a probability to “A,B,C and ‘something else’”. If it explicitly assigned an (arbitrarily close to) 0 to Z then it just fails at life.
Hi. I found the paper containing the example in question—it’s Bayesians sometimes cannot ignore even very implausible theories. I don’t understand everything in the paper, but it seems like they’ve anticipated your objection and have another example which explicitly includes a “Something else” case.
Forgive my confusion, I’m a bad statistician, of any sort. How do you include ‘something else’ in your model? Don’t you need to at least (for monte carlo techniques) be able to generate “forward” from parameters to simulated data?
Or do you include Gelman’s posterior predictive check in the model somehow, so that data that is sufficiently surprising causes a “misspecification alarm” to go off?
I’m not sure how the best way to handle simplifying a model without doing insane things. I do know that if what you are doing amounts to overtly “putting zero probability on it” then what you are doing is a terminal mistake that makes the process distinctly non-bayesian. I get the impression that the mistakes that bayesians are trying to correct with their after the fact testing of the model are different ones to this one. If common ‘bayesian statisticians’ do in fact make mistakes that are of this order then consider me mistaken but also consider their claims to be ‘bayesians’ also, more or less, lies.
I get the impression that the mistakes that bayesians are trying to correct with their after the fact testing of the model are different ones to this one.
If you choose a single model to work with, you are effectively putting zero probability on all other models (that are not contained in your chosen model as sub-models). Gelman’s posterior predictive checks aren’t motivated by this consideration (one of his non-mainstream-for-a-Bayesian stances is that model probabilities aren’t useful). Nevertheless, the checks are directed at identifying ways in which the model fits the data poorly, with an eye to guiding further model elaboration, so they do address this issue in a sense.
“putting zero probability on it”… is a terminal mistake that makes the process distinctly non-bayesian.
Philosophically this is true, but practically speaking, it’s not. Setting certain posterior probabilities to zero can be a good approximation to a fully Bayesian analysis (e.g., this paper). In fact, if it’s appropriate to use a small number of sigfigs in your results, this approximation can yield the exact same results far faster. I don’t think it’s fair to call the labeling of such an analysis as Bayesian a lie.
If you choose a single model to work with, you are effectively putting zero probability on all other models (that are not contained in your chosen model as sub-models).
I follow this reasoning and it applies in many cases. The reason I do not consider it applicable to the example given is due to the explicit mentioning of “We can make Z’s pre-evidence probability arbitrarily small, to make this seem reasonable at the time.” That changes the meaning of the example significantly in my understanding.
I claim that if Z is given enough consideration that ‘arbitrarily small’ is plugged in rather than mere exclusion from a model then it is just an error not an approximation. There are valid examples of bayes-in-practice that support the position John takes but I just don’t consider this example a fair representation. Partly because the mistake is a bad way to handle urns and partly because explicitly plugging in bad priors for Z should make you explicitly expect bad posteriors for Z. Exclusion from the model itself is a different problem.
Good answer. I got a bit confused because Z has two meanings: “ball labelled Z was observed” (data), and “ball came from urn Z” (hypothesis). John’s model can assign zero probability to data than could possibly be observed, and that’s the big no-no.
How do you include ‘something else’ in your model? Don’t you need to at least (for monte carlo techniques) be able to generate “forward” from parameters to simulated data?
In the example provided it would be by having the labels “A, B, C and Zooblefuzz” where Zooblefuzz is clearly defined ‘any other urn than A, B or C’.
When one, or even N balls turn out to be labeled Z, the model (due to the initial zero probability on Z) continues insisting that the balls came from one of the initially-specified hypotheses.
If Pr(ball labelled Z | urn) = 0 for all urns under consideration then Pr(ball labelled Z) = 0 too, so the model tries to evaluate 0 / 0 and crashes.
Tangent: I was a huge fan of Proofs and Refutations, which is about mathematics; is there a book of Lakatos’s on the philosophy of science you would recommend?
I liked Proofs and Refutations a lot too. However, I’m ashamed to admit I have no special knowledge of Lakatos. All I know about his philosophy of science stuff (which I believe is closely related) is from his Wikipedia page (and Feyerabend’s). Gelman’s slides made the analogy with Lakatos explicitly.
Non-Bayesianism for Bayesians (based on a poor understanding of Andrew Gelman and Cosma Shalizi)
Lakatos (and Kuhn) are philosophers of science who studied science as scientists actually do it, as opposed to how scientists (at the time) claimed scientists do it. This is in contrast to taking the “scientific method” that we learned in grade school literally. Theories are not rejected at the first evidence that they have failed, they are patched, and so on.
Gelman and Shalizi’s criticism of Bayesian rhetoric (as far as I can make out from their blog posts and the slides of Gelman’s talk) is (explicitly) similar—what Bayesians do is different than what Bayesians say Bayesians do.
In particular, humans (as opposed to ideal, which is to say nonexistent, Bayesians) do not SIMPLY update on the evidence. There are other important steps in the process, such as checking whether, given the new data, your original model still looks reasonable. (This is “posterior predictive model checking”). This step looks a lot like computing a p-value, though Gelman recommends a graphical presentation, rather than condensing to a single number. In general, the notion of doing research on which priors are decent ones for scientific practice—strong enough to capture knowledge that we really do have, and weak enough to adapt to the evidence, given sufficient evidence—is a non-Bayesian notion; a perfect Bayesian only chooses their prior once, and never changes it. Note that historically, Jaynes worked on heuristics for how to choose a good prior, making him a non-Bayesian.
I saw an example that impressed me (and I can’t find the paper now to cite it!). Suppose you have an urn A, with many balls in it, labeled A, and one ball labeled Z. Also, an urn B, with many (but fewer) balls in it labeled B and one ball labeled C, et cetera, until you finally have an urn Z with the fewest balls in it, labeled Z. If we mix the urns and draw a ball from the mixture, which urn did it probably originally come from?
Suppose (because you’re a computationally-limited Bayesian) that you only include in your model the N highest-probability hypotheses. That is, you include A, B, C, in your model, but you neglect Z—that is, you put zero probability on it. (We can make Z’s pre-evidence probability arbitrarily small, to make this seem reasonable at the time.) When one, or even N balls turn out to be labeled Z, the model (due to the initial zero probability on Z) continues insisting that the balls came from one of the initially-specified hypotheses.
Of course, you could (and should) do a posterior predictive check, computing the probability that your model assigns to the observed data, and revise your model if the probability says your model is wack. However, that step “looks frequentist”, and isn’t explicitly included the rhetoric of “Bayesian Statistics = Science”. Bayesians update on the evidence, they don’t revise their models!
Anyway, don’t get caught up in factionalism and tribal us vs. them thinking!
I like your point but not your example.
That isn’t just a computational limitation. It’s an outright bug. Something that assigns 0 to Z is just not even an approximation of a Bayesian. A sane agent with limited resources may, for example, assign a probability to “A,B,C and ‘something else’”. If it explicitly assigned an (arbitrarily close to) 0 to Z then it just fails at life.
Hi. I found the paper containing the example in question—it’s Bayesians sometimes cannot ignore even very implausible theories. I don’t understand everything in the paper, but it seems like they’ve anticipated your objection and have another example which explicitly includes a “Something else” case.
Forgive my confusion, I’m a bad statistician, of any sort. How do you include ‘something else’ in your model? Don’t you need to at least (for monte carlo techniques) be able to generate “forward” from parameters to simulated data?
Or do you include Gelman’s posterior predictive check in the model somehow, so that data that is sufficiently surprising causes a “misspecification alarm” to go off?
I’m not sure how the best way to handle simplifying a model without doing insane things. I do know that if what you are doing amounts to overtly “putting zero probability on it” then what you are doing is a terminal mistake that makes the process distinctly non-bayesian. I get the impression that the mistakes that bayesians are trying to correct with their after the fact testing of the model are different ones to this one. If common ‘bayesian statisticians’ do in fact make mistakes that are of this order then consider me mistaken but also consider their claims to be ‘bayesians’ also, more or less, lies.
If you choose a single model to work with, you are effectively putting zero probability on all other models (that are not contained in your chosen model as sub-models). Gelman’s posterior predictive checks aren’t motivated by this consideration (one of his non-mainstream-for-a-Bayesian stances is that model probabilities aren’t useful). Nevertheless, the checks are directed at identifying ways in which the model fits the data poorly, with an eye to guiding further model elaboration, so they do address this issue in a sense.
Philosophically this is true, but practically speaking, it’s not. Setting certain posterior probabilities to zero can be a good approximation to a fully Bayesian analysis (e.g., this paper). In fact, if it’s appropriate to use a small number of sigfigs in your results, this approximation can yield the exact same results far faster. I don’t think it’s fair to call the labeling of such an analysis as Bayesian a lie.
I follow this reasoning and it applies in many cases. The reason I do not consider it applicable to the example given is due to the explicit mentioning of “We can make Z’s pre-evidence probability arbitrarily small, to make this seem reasonable at the time.” That changes the meaning of the example significantly in my understanding.
I claim that if Z is given enough consideration that ‘arbitrarily small’ is plugged in rather than mere exclusion from a model then it is just an error not an approximation. There are valid examples of bayes-in-practice that support the position John takes but I just don’t consider this example a fair representation. Partly because the mistake is a bad way to handle urns and partly because explicitly plugging in bad priors for Z should make you explicitly expect bad posteriors for Z. Exclusion from the model itself is a different problem.
Good answer. I neglected to read up-thread with enough thoroughness.
Good answer. I got a bit confused because Z has two meanings: “ball labelled Z was observed” (data), and “ball came from urn Z” (hypothesis). John’s model can assign zero probability to data than could possibly be observed, and that’s the big no-no.
In the example provided it would be by having the labels “A, B, C and Zooblefuzz” where Zooblefuzz is clearly defined ‘any other urn than A, B or C’.
for context: Gelman is a bayesian and Shalizi is an anti-bayesian.
If Pr(ball labelled Z | urn) = 0 for all urns under consideration then Pr(ball labelled Z) = 0 too, so the model tries to evaluate 0 / 0 and crashes.
Tangent: I was a huge fan of Proofs and Refutations, which is about mathematics; is there a book of Lakatos’s on the philosophy of science you would recommend?
I liked Proofs and Refutations a lot too. However, I’m ashamed to admit I have no special knowledge of Lakatos. All I know about his philosophy of science stuff (which I believe is closely related) is from his Wikipedia page (and Feyerabend’s). Gelman’s slides made the analogy with Lakatos explicitly.