Philosophical apologetics book suggests replacing Bayes theorem with “Inference to the Best Explanation” (IBE)
I’m about 2⁄3 through an apologetics book that was recommended to me, Menssen and Sullivan’s, The Agnostic Inquirer, and was quite surprised to run into a discussion of Bayes theorem and wanted some input from the LW community. The book is quite philosophical and I admit that I am probably not following all of it. I find heady philosophy to be one of these areas where something doesn’t seem quite right (as in the conclusion that someone pushes), but I can’t always identify what.
In any case, the primary point of the book is to attempt to replace the traditional apologetics method with a new one. The status quo has been to appeal to “natural theology,” non-theological areas of discussion which attempt to bring one to the conclusion that some kind of theistic being exists, and from there establish that Christianity is the true formulation of what, exactly, this theistic being is/wants/does, etc by examining revealed theistic truths (aka the Bible). Menssen and Sullivan attempt to suggest that revelation need not be put off so long.
I don’t want to get too into it, but think this helps set the stage. Their argument is as follows:
(1) If it is not highly unlikely that a world-creator exists, then investigation of the contents of revelatory claims might well show that it is probable that a good God exists and has revealed.
(2) It is not highly unlikely that a world-creator exists.
(3) So, investigation of the content of a revelatory claim might well show it is probable that a good God exists and has revealed.
(4) So, a negative conclusion concerning the existence of a good God is not justified unless the content of a reasonable number of leading revelatory claims has been seriously considered. (p. 63)
Issues Menssen and Sullivan have with Bayes applicability to this arena:
Then they begin trying to choose the best method for evaluating revelatory content. This is where Bayes comes in. The pages are almost all available via Google books HERE in Section 4.2.1, beginning on page 173. They suggest the following limitations:
Bayesian probability works well when the specific values are known (they use the example of predicting the color of a ball to be drawn out of a container). In theology, the values are not known.
The philosophical community is divided about whether Bayesian probability is reliable, and thus everyone should be hesitant about it too if experts are hesitant.
If one wants to evaluate the probability that this world exists and there are infinitely many possibilities, n, then no matter how small a probability one assigns to each one, the sum will be infinite. (My personal take on this is whether a literal infinity can exist in nature… 1/n * n is 1, but maybe I’m not understanding their exact gripe.)
In some cases, they hold that prior probability is a useless term, as it would be “inscrutable.” For example, they use Elliott Sober’s example of gravity. What is it’s prior probability? If such a question is meaningless, they hold that “Has a good god revealed?” may be in the same category and thus Bayesian probability breaks down when one attempts to apply it.
There are so many components to certain questions that it would be nearly impossible or impossible to actually name them all and assign probabilities so that the computation accounted for all the bits of information required.
If Bayes’ theorem produces an answer that conflicts with answers arrived at via other means, one might simply tweak his/her Bayes values until the answer aligned with what was desired.
Their suggested alternative, Inference to the Best Explanation (IBE)
They define IBE as follows:
If a hypothesis sufficiently approximates an ideal explanation of an adequate range of data, then the hypothesis is probably or approximately true.
h1 sufficiently approximates an ideal explanation of d, an adequate range of data.
So h1 is probably or approximately true.
Obviously the key lies in their definition of “ideal explanation.” They cover this in great detail, but it’s not all that specific. Basically, they want the explanation to be deductive (vs. inductive), grounded in “fundamental substances and properties” (basic, established truths), and to be overwhelmingly more true than contending hypotheses. You can read a bit more by reading the Google books section following the above.
I’m interested in takes on the above limitations that I’ve summarized in comparison with their suggested replacement, IBE. I’d be especially interested in hearing from those more familiar with philosophy. Menssen & Sullivan cover others’ definitions of IBE prior to presenting theirs, which suggests that it’s a common term in philosophy. My gut intuition is that Bayes should produce the same answer as the IBE, but should also be more reliable since it’s not an inference by a defined method.
It seems like their proposal for IBE is doing precisely what Bayes’ theorem is doing… but they’ve just formalized a way to do it more sloppily since they claim one can’t know the exact numbers to use in Bayes’ theorem. I can’t tell what, exactly, is different. You take your priors, factor in your “adequate range of data” (new evidence), and figure out of the hypothesis, given the prior probability and adjusted for additional evidence, comes out more probabile than not (making it probably or approximately true).
Is this just funny games? Is there something to the proposed limitations of Bayes’ theorem they present? I think Richard Carrier explained how he bypasses situations when exact numbers are not known (and how often are the numbers known precisely?) in his argument entitled Why I Don’t Buy the Resurrection Story. He specifies that even if you don’t know the exact number, you can at least say that something wouldn’t be less likely than X or more likely than Y. In this way you can use the limits of probability in your formula to still compute a useful answer, even if it’s not as precise as you would like.
If anyone is interested in this in more detail, I could perhaps scan the relevant 20 or so pages on this and make them available somewhere. There’s only a few pages missing from Google books.
So abductive inference (IBE) is distinguished from enumerative induction. The former basically involves inverting deductive conditionals to generate hypotheses. So when asked why the grass is wet we immediately start coming up with deductive stories for how the grass might have gotten wet “It rained therefore the grass got wet”, “The sprinkler was on therefor the grass got wet” (note that the idea that explanans deductively entail explanandum is almost certainly false but this was the dominant position in philosophy of science when abductive inference was first discussed and a lot of times people don’t update their versions of IBE to take into account the, much better, causal model of explanation). Then the available hypotheses are compared and the best one chosen according to some set of ideal criteria. In contrast, enumerative induction involves looking at lots of particular pieces of evidence and generalizing “I’ve seen the grass be wet 100 times and 90 of those times it had rained, therefore it rained (with a 10% chance I’m wrong)”.
Now, the “ideal criteria” for an IBE differ from philosopher to philosopher but everyone worth their salt will include degree of consistency with past observation so IBE essentially subsumes enumerative induction. Usually the additional criteria are vaguer things like parsimony and generality. Now, since enumerative induction is about the frequency of observations it is more conducive to mathematical analysis and the Bayesian method. But the things IBE adds to the picture of inference aren’t things the Bayesian method has to ignore, you just have to incorporate the complexity of a hypothesis into its prior. But since objective Occam priors are the most confusing, controversial and least rigorous aspect of Bayesian epistemology there is room to claim that somehow our incorporation of economy into our inferences requires a vaguer, more subjective approach.
But that’s stupid. The fact that we’re bad Bayesian reasoners isn’t a rebuttal to the foundational arguments of Bayesian epistemology (though, those aren’t fantastic either). Your inferential method still has to correspond to Bayes’ rule or your beliefs are vulnerable to being dutch booked and your behavior can be demonstrated to be suboptimal according to your own notion of utility (assuming certain plausible axioms about agency).
That the authors say things like “If one wants to evaluate the probability that this world exists and there are infinitely many possibilities, n, then no matter how small a probability one assigns to each one, the sum will be infinite” suggests they are either unfamiliar with or reject an approach identifying the ideal prior with the computational complexity of the hypothesis (note that a strict enumerative inductive approach can be redeemed if facts about computational complexity are nothing more than meta-inductive facts about our universe(s)).
Whether one accepts that approach or not it plainly can’t be worse than relying on evolved, instinctual or aesthetic preference when picking hypotheses—which I assume is where they’re going with this. One needn’t apply an explicitly Bayesian method at all to reject God on IBE grounds. Theism plainly fails any economy criteria one would want- and I could go on about why but this comment needs to end.
Thanks for the comment; I think it aligns well with many of the rest of the comments as well. I actually would be interested to know what you mean by “fails any economy criteria.” I’m not familiar with that term.
An explanation is usually said to be economical if it is simple, general, elegant etc. In other words, whatever criteria you want to use in addition to ‘consistent with evidence’- this is mostly (entirely?) covered in these parts by discussing the complexity of the hypothesis. I’m just using it as a catch-all term for all those sorts of criteria for a good hypothesis. To make God seem like a ‘good’ hypothesis for something you need to pretty much invert your usual standards of inference.
Duh! In reading your response, it seems so simple, but for some reason when I read “economy” the first time, I just blanked as to what it would mean. I guess I’ve not been active enough around here lately. I understand, now. Complexity, more parts = lower probability by definition, Occam’s razor, etc.
I’m thinking your point is that any phenomenon’s explanation automatically decreases in economy when introducing beings and concepts vastly foreign to observation and experience. Would that be reasonably accurate?
Eh, yes.
Pretty much, no. The questions they are trying to answer, of “how does a system deal with a lack of exact numbers” and such, are approached with tools like replacing discrete probabilities with probability distributions and using maximum ignorance or Kolgomorov complexity-approximating priors.
More broadly, be skeptical of any weakenings. Going from quantitative posterior probabilities to qualitative “probably or approximately yes” judgments is suspicious (especially because posterior probabilities are about “probably or appoximately” yes or no answers). Similarly, moving from cardinal probabilities (posteriors that are a decimal between 0 and 1 representing how likely a thing is to happen) to ordinal probabilities as Richard Carrier (see quote below) does is likely bad.
Thanks for the comment. This lines up with my [basic-level] thinking on this. It struck me as similar to EY’s point in Reductionism with his friend insisting that there was a difference between predictions resulting from Newtonian calculations and those found using relativity.
In a similar vein, they seem to insist that this area isn’t governed by Bayes’ theorem.
Lastly, I might have not credited Carrier well enough. He does assign cardinal values to his predictions. He simply makes the point that when we don’t know, we can use a “fringe” number that everyone agrees is at the low or high end. For example, he’s making a case against the resurrection and needs a value for the possibility that the Centurion didn’t properly verify Jesus’ death. Carrier says:
All I was pointing out is that Carrier, though making a case to those who disagree with him, tries to present some reasons why a person in that day and time might mistake a living (but wounded) person for being dead when they weren’t. Then he brings in a cardinal number, in essence saying, “You’ll grant me that there’s a 1 in 1000 chance that this guy made a mistake, right?”, and then he proceeds to use the value itself, not a qualitative embodiment.
Is that any clearer re. Carrier?
I googled “Inference to the Best Explanation”, and this paper of Gilbert Harman appears to be where the phrase was first coined, although the general idea goes back further. More recently, there’s a whole book on the subject, and lukeprog’s web site has an introductory article by the author of that book.
There isn’t a single piece of mathematics in either of the two papers, which leads me to expect little of them. The book (of which I’ve only seen a few pages on Amazon) does contain a chapter on Bayesian reasoning, arguing that it and IBE are “broadly compatible”. This appears to come down to the usual small-world/large-world issue: Bayes is sound mathematics (say the small-worlders) when you already have a hypothesis that gives you an explicit prior, but it must yield to something else when it comes to finding and judging hypotheses.
That something else always seems to come down to magic. It may be called IBE, or model validation, or human judgement, but however many words are expended, no method of doing it is found. It’s the élan vital of statistics.
ETA: I found the book in my university library, but only the first edition of 1991, which is two chapters shorter and doesn’t include the Bayes chapter (or any other mathematics). In the introduction (which is readable on Amazon) he remarks that IBE has been “more a slogan than an articulated philosophical theory”, and that by describing inference in terms of explanation, it explains the obscure by the obscure. From a brief scan I was not sufficiently convinced that he fixes these problems to check the book out.
Thanks for the comment. The lack of math is a problem, and I think you’ve said it nicely:
Reading this book, Agnostic Inquirer, is quite the headache. It’s so obscure and filled with mights, maybes, possibly’s, and such that I constantly have this gut feeling that I’m being led into a mental trap but am not always sure which clauses are doing it. Same for IBE. It sounds common-sensically appealing. “Hey, Bayes is awesome, but tell me how you expect to use it on something like this topic? You can’t? Well of course you can’t, so here’s how we use IBE to do so.”
But the heuristic strikes me as simply an approximation of what Bayes would do anyway, so I was quite confused as to what they were trying to get at (other than perhaps have their way with the reader).
If you don’t allow exact probabilities for things, there are decisions it becomes impossible to make, such as whether or not to take a given bet. If you try to come up with a different method of choosing, you either end up with paradoxes, or you end up behaving exactly as if you were using Bayesian statistics.
This is only true if we assign them all the same probability. We tend to weight them by their complexity. Also, if you didn’t, the more complex possibilities would tend to contain the simpler ones, which may approach a limit as the number of possibilities considered increases.
Loved that point. Well said and I hadn’t thought of that.
Which is what I think they’re doing here. Coming up with some new formulation that may be operating within the realm of Bayes anyway.
I’d be interested in hearing more about this. Can you give an example of a paradox? Do you just mean that if your decision making method is not robust (when creating your own), you may end up with it telling you to both make the bet and not make the bet?
Either you would a) neither be willing to take a bet nor take the opposite bet, b) be willing to take a combination of bets such that you’d necessarily lose, or c) use Bayesian probability.
Thanks for the link and explanation.
This traditional line of apologetics is all very weird to me, a committed Christian of nearly 30 years. Seriously studying the scriptures is just what convinced me the Bible is not inspired by an intelligent or loving God.
I don’t have the knowledge (yet) to answer your questions, but I’m very interested in what others will say.
Thanks for providing that link. Loved reading through the comments and the first set was really refreshing (What reasons do you have for accepting the supernatural?).
That seems non-obvious to me. It’s highly problematic, sure—but not “key”. “Key” is “adequate range of data”. That cannot be an objective measure. It occurs to me that Bayes’ theorem has no such problem; it simply takes additional input and revises its conclusions as they come—it makes no presumption of its conclusions necessarily being representative of absolute truth.
I also, personally, take objection to:
I find it is highly unlikely that “a world-creator” exists. For two reasons. 1) Our universe necessarily possesses an infinite history (Big Bang + Special Relativity says this.) 2) Any ruleset which allows for spontaneous manifestation of an agentless system is by definition less unlikely than the rulesets which allow for the spontaneous manifestation of an agent that can itself manifest rulesets. (The latter being a subset of the former, and possessed of greater ‘complexity’—an ugly term but there just isn’t a better one I am familiar with; in this case I use it to mean “more pieces that could go wrong if not assembled ‘precisely so’.)
I can’t say, as a person who is still neutral on this whole “Bayesian theory” thing (i.e.; I feel no special attachment to the idea and can’t say I entirely agree with the notion that our universe in no ways truly behaves probabilistically) -- I can’t say that this topic as related is at all convincing.
Can you clarify? Big Bang is usually put a little more than 13 billion years ago; that’s a lot of time, but not infinity.
Here’s a thought experiment for you: Imagine that you’ve decided to take a short walk to the black hole at the corner 7-11 / Circle-K / ‘Kwik-E-Mart’. How long will it take you to reach the event horizon? (The answer, of course, is that you never will.)
As you approach the event horizon of a quantum singularity, time is distorted until it reaches an infinitessimal rate of progression. The Bing Bang states that the entire universe inflated from a single point; a singularity. The same rules, thusly govern—in reverse; the first instants of the universe took an infinitely long period of time to progress.
It helps if you think of this as a two-dimensional graph, with the history of the universe as a line. As we approach the origin mark, the graph of history curves; the “absolute zero instant” of the Universe is, thusly, shown to be an asymptotic limit; a point that can only ever be approached but never, ever reached.
If you decide to really walk inside, you could be well behind the horizon before you remember to check your watch and hit the singularity not long afterwards.
There are different times in general relativistic problems. There is the coordinate time, which is what one usually plots on the vertical axis of a graph. This is (with usual choice of coordinates) infinite when any object reaches the horizon, but it also lacks immediate physical meaning, since GR is invariant with respect to (almost) arbitrary coordinate changes. Then there may be times measured by individual observers. A static observer looking at an object falling into a black hole will never see the object cross the horizon, apparently it takes infinite time to reach it. But the proper time of a falling observer (the time measured by the falling observer’s clocks) is finite and nothing special happens at the horizon.
Correct, but since the entire universe was at that singularity, the distortion of time is relevant.
How exactly? It is the physical proper time since the Big Bang which is 13,7 billion years, isn’t it?
Yes and no. Since the first second took an infinitely long period of time to occur.
What does that mean? Do you say that proper time measured along geodesics was infinite between the Big Bang and the moment denoted as “first second” by the coordinate time, or that the coordinate time difference between those events is infinite while the proper time is one second?
The latter statement conforms to my understanding of the topic.
I agree. But now, how does that justify talking about infinite history? Coordinate time has no physical meaning, it’s an arbitrary artifact of our description and it’s possible to choose the coordinates in such a way to have the time difference finite.
How does it not? It’s a true statement: the graph of our history is infinitely long.
I can’t agree with that statement.
That much is true, but it fails to reveal explicably the nature of why the question, “What happened before the Big Bang?” as being as meaningless as “What’s further North than the North Pole?”
A graph of our history is not our history. Saying that our history is infinitely long because in some coordinates its beginning may have t=-\infty is like saying the North Pole is infinitely far away because it is drawn there on Mercator projection maps. Anyway, it’s not the graph of our history; there are many graphs and only some of them are infinitely long.
It would be actually helpful if you also said why.
We aren’t discussing the question “what happened before the Big Bang”, but rather “how long ago the Big Bang happened”.
It is currently unknown how to apply special relativity SR and general relativity GR to quantum systems and it appears likely that they break down at this level. Thus applying us SR or GR on black holes or the very beginning of the universe is unlikely to result in perfectly accurate description of how the universe works.
I can see where you’re coming from. I may have mistaken “adequate range of data” for simply “range of data.” Thus it read more like, “I have this set of data. Which hypothesis is most closely like the ‘ideal explanation’ of this data.” Thus, the key piece of information will be in how you define “ideal explanation.”
Re-reading, I think both are critical. How you define the ideal still matters a great deal, but you’re absolutely right… the definition of an “adequate range” is also huge. I also don’t recall them talking about this, so that may be another reason why it didn’t strike me as strongly.
Could you explain this? I thought that the fact that our universe did behave probabilistically was the whole point of Bayes’ theorem. If you have no rules of probability, why would you have need for a formula that says if you have 5 balls in a bucket and one of them is green, you will pull out a green one 20% of the time? If the universe weren’t probabilistic, shouldn’t that number be entirely unpredictable?
Critical I can agree to. “Key” is a more foundational term than “critical” in my ‘gut response’.
The below might help:
In other words; a Bayesian believes that each trial will have a set outcome that isn’t ‘fuzzy’ even at the time the trial is initiated. The frequentist on the other hand believes that probability makes reality itself fuzzy until the trial concludes. If you had a sufficiently accurate predicting robot, to the Bayesian, it would be ‘right’ in one million out of one million coin flips by a robotic arm. To the frequentist, on the other hand, that sort of accuracy is impossible.
Now, I believe Bayesian statistical modeling to be vastly more effective at modeling our reality. However, I don’t think that belief is incompatible with a foundational belief that our universe is probabilistic rather than deterministic.
I can dig.
My initial response was, “No way Bayesians really believe that.” My secondary response was, “Well, if ‘sufficiently accurate’ means knowing the arrangement of things down to quarks, the initial position, initial angle, force applied, etc… then, sure, you’d know what the flip was going to be.”
If you meant the second thing, then I guess we disagree. If you meant something else, you’ll probably have to clarify things. Either way, what you mean by “sufficiently accurate” might need some explaining.
Thanks for the dialog.
When I was first introduced to the concept of Bayesian statistics, I had rather lengthy conversations on just this very example.
“Sufficiently accurate” means “sufficiently accurate”, in this case. sufficient: being as much as needed; accurate. Synthesize the two and you have “being as without error and precise as needed”. Can’t get more clear than that, I fear.
Now, if I can read into the question you’re tending to with the request—well… let’s put it this way; there is a phenomenon called stochastic resonance. We know that quantum-scale spacetime events do not have precise locations despite being discrete phenonema (wave-particle duality): this is why we don’t talk about ‘location’ but rather ‘configuration space’.
Now, which portion of the configuration space will interact with which other portion in which way is an entirely probabilistic process. To the Bayesians I’ve discussed the topic with at any length, this is where we go ‘sideways’; they believe as you espoused: know enough points of fact and you can make inerrant predictions; what’s really going to happen is set in stone before the trial is even conducted. Replay it a trillion, trillion times with the same exact original conditions and you will get the same results every single time. You just have to get the parameters EXACTLY the same.
I don’t believe that’s a true statement. I believe that there is and does exist material randomness and pseudorandomness; and I believe further that while we as humans cannot ever truly exactly measure the world’s probabilities but instead only take measurements and make estimates, those probabilities are real.
Your “read into where I was tending with the request” was more like it. Sorry if I was unclear. I was more interested in what phenomenon such a machine would have at its disposal—anything we can currently know/detect (sensors on the thumb, muscle contraction detection of some sort, etc.), only a prior history of coin flips, or all-phenomenon-that-can-ever-be-known-even-if-we-don’t-currently-know-how-to-know-it? By “accurate”
I was more meaning, “accurate given what input information?” Then again, perhaps your addition of “sufficiently” should have clued me in on the fact that you meant a machine that could know absolutely everything.
I’ll probably have to table this one as I really don’t know enough about all of this to discuss further, but I do appreciate the food for thought. Very interesting stuff. I’m intuitively drawn to say that there is nothing actually random… but I am certainly not locked into that position, nor (again) do I know what I’m talking about were I to try and defend that with substantial evidence/argument.
Funny thing. Just a few hours ago today, I was having a conversation with someone who said, “I need to remember, {Logos01}, that you use words in their literal meaning.”
It’s a common intuition. I have the opposite intuition. As a layman, however, I don’t know enough to get our postulates in line with one another. So I’ll leave you to explore the topic yourself.
Indeed. Whether I should have caught on, didn’t think about what you wrote or not, or perhaps am trained not to think of things precisely literally… something went awry :)
To my credit (if I might), we were talking fairly hypothetical, so I don’t know that it was apparent that the prediction machine mentioned would have access to all hypothetical knowledge we can conceive of. To be explicitly literal, it might have helped to just bypass to your previous comment:
That would have done it easier than reference to a prediction machine, for me at least. But again, I’m more of a noob, so mentioning this to a more advanced LWer might have automatically lit up the right association.
Sounds good. Thanks again for taking the time to walk through that with me!