Dissolving the Problem of Induction
The Problem of Induction, first published by David Hume in 1739, is potentially a longstanding crack in the foundations of science. From LessWrong’s tag:Induction page:
Modern views of induction state that any form of reasoning where the conclusion isn’t necessarily entailed in the premises is a form of inductive reasoning [...] “The sun has always risen, so it will also rise tomorrow”. [...] Contrary to deduction, induction can be wrong since the conclusions depend on the way the world actually is, not merely on the logical structure of the argument.
There has historically been a problem with the justification of the validity of induction. Hume argued that the justification for induction could either be a deduction or an induction. Since deductive reasoning only results in necessary conclusions and inductions can fail, the justification for inductive reasoning could not be deductive. But any inductive justification would be circular.
When I was first taught the Problem of Induction as a college student, I felt the same way as Bertrand Russell:
[Russell] expressed the view that if Hume’s problem cannot be solved, “there is no intellectual difference between sanity and insanity”.
I felt unsure—maybe not on a gut level, but on an intellectual level—about the solidity of the literal ground under my feet. Sure, the ground has been solid every day of my life, but why should it continue to be solid for even another second? What’s the logic? What’s the justification?
I didn’t have a satisfactory answer at the time, so I just went on with my life. I was acting like a typical non-philosopher civilian, acting as if it’s okay to just step over a crack in the foundation of science, taking a leap of faith that the whole sane world won’t crumble like Russell warned, and going on with my other activities.
Then I read Eliezer’s 2008 post on the topic, titled “Where Recursive Justification Hits Bottom”:
Why do I believe that the Sun will rise tomorrow?
Because I’ve seen the Sun rise on thousands of previous days.
Ah… but why do I believe the future will be like the past?
Even if I go past the mere surface observation of the Sun rising, to the apparently universal and exceptionless laws of gravitation and nuclear physics, then I am still left with the question: “Why do I believe this will also be true tomorrow?”
I could appeal to Occam’s Razor, the principle of using the simplest theory that fits the facts… but why believe in Occam’s Razor? Because it’s been successful on past problems? But who says that this means Occam’s Razor will work tomorrow?
Eliezer explains that he’s comfortable using inductive reasoning to justify inductive reasoning because it’s a reflectively-stable equilibrium:
I start going around in a loop at the point where I explain, “I predict the future as though it will resemble the past on the simplest and most stable level of organization I can identify, because previously, this rule has usually worked to generate good results; and using the simple assumption of a simple universe, I can see why it generates good results; and I can even see how my brain might have evolved to be able to observe the universe with some degree of accuracy, if my observations are correct.”
He elaborates in a followup post that self-justifying reasoning is the best you can do when you’re reasoning about your own reasoning:
I don’t think that going around in a loop of justifications through the meta-level is the same thing as circular logic. I think the notion of “circular logic” applies within the object level, and is something that is definitely bad and forbidden, on the object level. Forbidding reflective coherence doesn’t sound like a good idea. But I haven’t yet sat down and formalized the exact difference—my reflective theory is something I’m trying to work out, not something I have in hand.
I understand Eliezer to be saying: Sure, the Problem of Induction is a problem we have to solve, a potential crack in the foundation of science that we have to do something about, but we can solve it with a totally fine one-off hack: we just paper over the crack by granting ourselves an axiom that we live in a universe where induction works. Once we paper over the crack with that axiom, we don’t have to fear that the crack will open up and swallow us up, because when we (humans and superintelligent AIs) introspect on the crack from within our papered-over world, it’s not a crack anymore; it has every appearance of a solid foundation.
I had figured this was probably the end of the story with the Problem of Induction. And I found it satisfying enough… Until today, when I read David Deutsch’s 1996 book, The Fabric of Reality.
Deutsch doesn’t exactly attempt to solve the Problem of Induction; he instead says there is no Problem of Induction.
How Scientific Theories Actually Work
First, Deutsch points out that “induction” is a terrible characterization of how actual scientific theories get generated and accepted:
It is hard to know where to begin in criticizing the inductivist conception of science — it is so profoundly false in so many different ways. Perhaps the worst flaw [of the inductivist conception of science], from my point of view, is the sheer non sequitur that a generalized prediction is tantamount to a new theory. [P. 59]
For example, it doesn’t do justice to Einstein’s two theories of relativity to say that his work consisted of extrapolating a bunch of similar observations. It’s more accurate to describe him as struggling to navigate the vastness of concept-space in order to locate a mathematical structure that fit various constraints, such as keeping the speed of light constant in every reference frame.
Furthermore, in many cases it only became possible to make a “generalization” or “extrapolation”, or claim that future observations would be “similar” to past ones, once the equations of relativity were available to operationalize those terms in a theory-specific way.
Eliezer himself says that it’s possible to hypothesize General Relativity not from a bunch of repetitive observations, but from a single observation examined by a sufficiently capable scientist:
A Bayesian superintelligence, hooked up to a webcam, would invent General Relativity as a hypothesis—perhaps not the dominant hypothesis, compared to Newtonian mechanics, but still a hypothesis under direct consideration—by the time it had seen the third frame of a falling apple. It might guess it from the first frame, if it saw the statics of a bent blade of grass.
Deutsch’s insight about the Problem of Induction, which he credits to Karl Popper, is that it makes a type error: the type of a Hypothesis is not a hand-wavy Generalized Prediction, it’s a detailed computational Model. And what a scientist’s Observation→Hypothesis function does isn’t generalization, it’s reverse engineering.
Our best scientific theories are the ones that yield the highest compression factor on our observations and predictions. They didn’t win a contest for “best application of induction” against other candidate theories; they won out either because the other theories were ruled out by observation, or because the other theories didn’t yield as high of a compression factor.
It’s Not About Past vs. Future
Consider Eliezer’s words again (emphasis mine):
I start going around in a loop at the point where I explain, “I predict the future as though it will resemble the past on the simplest and most stable level of organization I can identify, because previously, this rule has usually worked to generate good results; and using the simple assumption of a simple universe, I can see why it generates good results; and I can even see how my brain might have evolved to be able to observe the universe with some degree of accuracy, if my observations are correct.”
In the emphasized part, Eliezer is implicitly making the retreat from the obviously false claim that surface-level features of the universe always stay consistent, to the more defensible claim that the universe has a “simple and stable level of organization” where things always stay consistent.
But how defensible is it that “the future will resemble the past” on any level? Do we really want to try and salvage any form of this claim?
Deutsch points out that the only sense in which the future actually resembles the past, generally speaking, is a trivial one: a theory explains how multiple pieces of the universe (multiple regions of time, multiple regions of space, two forces, two trajectories, etc) are related in some way. We can trivially claim that a theory of gravity expresses a fundamental “similarity” between the motions of a bullet and a satellite, even though one falls and one orbits.
The key underlying assumption of Eliezer’s statement is: “I am able to identify simpler and more stable levels of organization in the universe than what meets the eye”. But if we grant that those levels of organization exist, we’re granting that the universe is compressible: that it has properties shared by different areas of space, properties shared by different types of matter, properties shared by different types of forces, properties shared by different trajectories of motion, etc.
Having granted all that, we don’t need to grant an additional license to “predict the future as though it will resemble the past”. Believing that a property is shared by different times of the universe—namely, the past and future—doesn’t require a special-case justification.
In Deutsch’s words:
The best existing theories, which cannot be abandoned lightly because they are the solutions of problems, contain predictions about the future. And these predictions cannot be severed from the theories’ other content[...] because that would spoil the theories’ explanatory power. Any new theory we propose must therefore either be consistent with these existing theories, which has implications for what the new theory can say about the future, or contradict some existing theories but address the problems thereby raised, giving alternative explanations, which again constrains what they can say about the future. [P. 162]
Solving vs. Dissolving*
Here’s what I see as solving vs. dissolving the Problem of Induction:
Eliezer attempted to solve the Problem of Induction by licensing a “reflective loop through the meta level”.
Deutsch dissolved the Problem of Induction by pointing out that induction doesn’t actually play a role in science. Science is a reverse-engineering exercise that doesn’t rely on the assumption that “the future will be similar to the past”.
When we understand the business of science as reverse-engineering a compressed model of the universe, I don’t think its justification relies on a “loop through the meta level”. Although, admittedly, it does rely on Occam’s Razor.
The Problem of Occam’s Razor
I believe Eliezer and others have to some degree conflated the original Problem of Induction with the easier problem of justifying an Occamian Prior. Eliezer writes:
There are possible minds in mind design space who have anti-Occamian and anti-Laplacian priors; they believe that simpler theories are less likely to be correct, and that the more often something happens, the less likely it is to happen again.
And when you ask these strange beings why they keep using priors that never seem to work in real life… they reply, “Because it’s never worked for us before!”
Now, one lesson you might derive from this, is “Don’t be born with a stupid prior.” This is an amazingly helpful principle on many real-world problems, but I doubt it will satisfy philosophers.
In my view, it’s a significant and under-appreciated milestone that we’ve reduced the original Problem of Induction to the problem of justifying Occam’s Razor. We’ve managed to drop two confusing aspects from the original PoI:
We don’t have to justify using “similarity”, “resemblance”, or “collecting a bunch of confirming observations”, because we know those things aren’t key to how science actually works.
We don’t have to justify “the future resembling the past” per se. We only have to justify that the universe allows intelligent agents to learn probabilistic models that are better than maximum-entropy belief states.
Qiaochu Yuan thinks the Problem of Occam’s Razor might even be solvable with a simple counting argument:
Roughly speaking, weak forms of Occam’s razor are inevitable because there just aren’t as many “simple” hypotheses as “complicated” ones, whatever “simple” and “complicated” mean, so “complicated” hypotheses just can’t have that much probability mass individually. (And in turn the asymmetry between simple and complicated is that simplicity is bounded but complexity isn’t.)
I feel like the Problem of Occam’s Razor is analogous to the problem of “placing math on a more secure foundation so that we really know for sure that 1+1=2”. Sure, it’s worth doing, but it’s not such a big crisis that we really have to wonder whether 1+1=2.
I feel much calmer and more confident about a quest to figure out how to place Occamian priors on a secure foundation, than I previously felt about a misguided quest to figure out why our universe should fundamentally owe us a future that’s similar to our past.
*Catchier than “Explaining vs. Explaining Away”, amirite?
This is one of those areas where I think people on LessWrong would benefit from reading more academic philosophy.
It’s been a while since I took academic philosophy classes, but I’m pretty sure the Problem of Induction and the Problem of Justifying Occam’s Razor have been known to be basically the same thing for at least a century. When they were taught to me in undergrad, they were presented that way, IIRC. When I taught my undergrads the problem, that’s certainly how I presented it.
I do agree that it constitutes a significant milestone of intellectual progress though!
As for the counting argument, that’s less well known. When I first heard it in undergrad (in the context of learning about Solomonoff Induction) it struck me as another important milestone of intellectual progress, that doesn’t solve the problem but probably brings us closer. I felt the same way about one of the things that makes solomonoff induction work (how hypotheses that are simpler have “more look-alikes” and thus when grouped together with look-alikes more measure) and subjective Bayesianism. Finally, I’m excited about something proven in Logical Induction—that logical inductors have the Occam Property. I still haven’t got around to understanding it deeply and thinking about what it means though. All in all I remain optimistic that the problem of induction has a solution.
To clarify, what I think is underappreciated (and what’s seemingly being missed in Eliezer’s statement about his belief that the future is similar to the past), isn’t that justifying an Occamian prior is necessary or equivalent to solving the original Problem of Induction, but that it’s a smaller and more tractable problem which is sufficient to resolve everything that needs to be resolved.
Edit: I’ve expanded on the Problem of Occam’s Razor section in the post:
I think you still do. In terms of induction, you still have the problem of grue and bleen. In terms of Occams Razor, it’s the problem of which language a description needs to be simple in.
Justifying that blue is an a-priori more likely concept than grue is part of the remaining problem of justifying Occam’s Razor. What we don’t have to justify is the wrong claim that science operates based on generalized observations of similarity.
Philosophy in general abounds with type errors. Many gotchas in gedankenexperiments rely on mucking about with reference-referent relations at different points in the chain of reasoning. Changing language so this sort of switcheroo is more easily seen was Korzybski’s main goal.
The tl;dr version imo (science and sanity is very long):
“Why” can refer to at least 4 different things, and more if you split up the dimensions of time, and variance/invariance. Repeated why steps at different levels of abstraction can switch freely between them.
The to-be verb form is doing lossy compression disguised as lossless compression.
Positive and negative evidential types are freely switched between without us noticing. Consider how much you need to pause to really understand the potential differences between “I perceive that she does not have a cup” vs “I do not perceive that she has a cup.”
Standard set theory problems wrt language ala Wittgenstein.
Comment promoted to frontpage.
.
Just kidding, but the compression ratio in this comment was awesome.
I feel like there’s something slippery happening when you claim
I think at best you can say Deutsch dissolves the problem for the project of science, but this is not the same thing as dissolving the problem of induction, which is generally considered impossible because it exists because of the problem of the criterion, i.e. how can you know the criterion by which you know something is true if you don’t first know some true thing. And although reducing the problem of induction to the problem of justifying Occam’s razor is helpful, it just pushes the problem around, because at some point you still have issues where you’ve reduced things as far as you can and you still have some question of the form “but how do I really know this?”. After all, I might ask about the proposed justification of Occam’s razor something like “why probabilities?”, and you better hope the answer is not some version of “because they are a simpler than alternatives”.
This is not to say we can’t get on with projects like science, only that there’s a epistemological gap we have to cover over, as you note. The general solution to this is called “pragmatism” and the specific solution in epistemology to this particular problem of justifying anything is called “particularism” because you pick some particular statement(s) to claim as true and go forth on their unjustified assumption.
If that’s not satisfying, epistemological nihilism is also an option if you don’t want to have to take a leap of faith to make some unjustified assumptions (i.e. propose some axioms), but it’s not a very useful position if you want to make distinctions about the world because it collapses them.
Ok I think I’ll accept that, since “science” is broad enough to be the main thing we or a superintelligent AI cares about.
Nobody believes any more than that induction is the sole source of scientific explanations. That was a feature of very early philosophy of science , such as Bacon’s.
Not all science is based on explanation. Sometimes , you have to be satisfied with predicting future occurrences of a phenomenon without understanding the underlying mechanism. This is can be called “curve fitting” or “finding empirical laws”. Either way, it is induction.
So: induction has uses other than hypothesis generation, meaning that you can’t dissolve the problem of induction by pointing out that it isn’t needed for hypothesis generation.
Since “no one believes that induction is the sole source of scientific explanations”, and we understand that scientific theories win by improving on their competitors in compactness, then the Problem of Induction that Russell perceived is a non-problem. That’s my claim. It may be an obvious claim, but the LW sequences didn’t seem to get it across.
You seem to be saying that induction is relevant to curve fitting. Sure, curve fitting is one technique to generate theories, but tends to be eventually outcompeted by other techniques, so that we get superseding theories with reductionist explanations. I don’t think curve fitting necessarily needs to play a major role in the discussion of dissolving the Problem of Induction.
I am saying that prediction is valuable per se, that curve fitting gives you predictions, and that curve fitting is induction, and that induction is therefore needed in spite of Deutschs argument.
Induction is also important for eveyday reasoning.
If you think of a theory as something that does nothing but make predictions, then induction is generating theories...but it is s not an explanatorytheory, in terms of the standard explanatory/ empirical distinction.
Unfortunately, the belief that theories can be losslessly represented as programmes elides the distinction.
Just because curve fitting is one way you can produce a shallow candidate model to generate your predictions, that doesn’t mean “induction is needed” in the original problematic sense, especially considering that what’s likely to happen is that a theory that doesn’t use mere curve fitting will probably come along and beat out the curve fitting approach.
If you assume that all science is theoretical, and/or you have endless time to generate the perfect theory , that is true.
But neither assumption is true.
Induction is vital for practical purposes. If your world is being ravaged by a disease , you need to understand its progression ahead of having a full theory. Our ancestors needed to understand that the berry that made you sick yesterday will make you sick today...and to do that well ahead of having a theory of biochemistry.
Inductive reasoning is important for survival, not just for relative luxuries like science.
Curve fitting isn’t Problematic. The reason it’s usually a good best guess that points will keep fitting a curve (though wrong a significant fraction of the time) is because we can appeal to a deeper hypothesis that “there’s a causal mechanism generating these points that is similar across time”. When we take our time and do actual science on our universe, our theories tell us that the universe has time-similar causal structures all over the place. Actual science is what licenses quick&dirty science-like heuristics.
Youre subsuming the epistemic problem of induction under the ontologcal problem of induction, but you haven’t offered a solution to the ontologcal problem of induction.
Edit:
How do you know that the world is stable? Effect has followed cause in the past, but stability means that it will also do so in the future..but to think that it will do so in the future because it has done so in the past is inductive reasoning.
I guess people are upvoting this because they found it useful, but the statement that you don’t need to directly prove induction, but that you can indirectly prove it via proving Occam’s Razor seems kind of obvious and not particularly interesting to me. And it seems to me that you’re reducing it to a harder problem in that resemblance of the past to the present is just one particular way in which a model can be simple. Indeed, you could use the counting argument directly on induction. Anyway, I’ll give this a second read and see if there’s anything I missed.
EDIT: See my second comment, as I didn’t fully understand it after my first read.
Well, I hope this post can be useful as a link you can give to explain the LW community’s mostly shared view about how one resolves the Problem of Induction. I wrote it because I think the LW Sequences’ treatment of the Problem of Induction is uncharacteristically off the mark.
I’m glad you wrote a post about this topic. When I was first reading the sequences, I didn’t find the posts by Eliezer on Induction very satisfying, and it was only after reading Jaynes and a bunch of papers on Solomonoff induction that I felt I had a better understanding of the situation. This post might have sped up that process for me by a day or two, if I had read it a year ago.
There was a little while where I thought Solomonoff Induction was a satisfying solution to the problem of induction. But there doesn’t seem to be any justification for the order over hypotheses in the Solomonoff Prior. Is there discussion/reading about this that I’m missing?
There are several related concepts (mostly from ML) that have caused me a lot of confusion, because of the way they overlap with each other and are often presented separately. These included Occam’s Razor and The Problem of Induction, and also “inductive bias”, “simplicity”, “generalisation”, overfitting, model bias and variance, and the general problem of assigning priors. I’d like there to be a post somewhere explaining the relationships between these words. I might try to write it, but I’m not confident I can make it clear.
Actually, I do appreciate you highlighting this, however, it’s because I think that Eliezer’s solution is somewhat underappreciated, which seems to be the opposite of what you think.
Ah yeah. Interesting how all the commenters here are talking about how this topic is quite obvious and settled, yet not saying the same things :)
Okay, it’s making a bit more sense now that I’ve reread It’s Not About Past And Future. If you just looked at the position of each particle at time t, we’d all be in different places due to the rotation of the Earth and electrons would be in a different part of their orbit. So we aren’t really making a similarity claim about primitives, but about the higher-level patterns and your claim is that if we admit that the universe follows these patterns then this automatically means that these patterns will apply in the future.
I don’t know. I don’t think we know that the universe follows these patterns as opposed to appearing to follow these patterns. And even if the universe has matched these patterns, it doesn’t mean that it has followed it in terms of these patterns being the causal reason for our observations, as opposed to some more complex pattern that would also explain it.
Yeah. My point is that the original statement of the Problem of Induction was naive in two ways:
It invokes “similarity”, “resemblance”, and “collecting a bunch of confirming observations”
It talks about “the future resembling the past”
#1 is the more obviously naive part. #2′s naivety is what I explain in this post’s “Not About Past And Future” section. Once one abandons naive conceptions #1 and #2 by understanding how science actually works, one reduces the Problem of Induction to the more tractable Problem of Occam’s Razor.
Hm, I see this claim as potentially beyond the scope of a discussion of the Problem of Induction.
“Hm, I see this claim as potentially beyond the scope of a discussion of the Problem of Induction.”
Not quite—because in order to avoid the problem of induction you need the universe to be following these patterns in the specific sense that these patterns are what is causing what we observed—not just for the universe to appear to follow these patterns.
If we reverse-engineer an accurate compressed model of what the universe appears like to us in the past/present/future, that counts as science.
If you suspect (as I do) that we live in a simulation, then this description applies to all the science we’ve ever done. If you don’t, you can at least imagine that intelligent beings embedded in a simulation that we build can do science to figure out the workings of their simulation, whether or not they also manage to do science on the outer universe.
If we live in a simulation, then it’s likely to be turned off at some point, breaking the induction hypothesis. But then, maybe it doesn’t matter as we wouldn’t be able to observe this.
The problem of induction of is more than one thing, because everything is more than one thing.
The most often discussed version is the epistemic problem, the problem of justifying why you should believe that future patterns will continue. That isn’t much affected by ontologcal issues like whether the universe is simulated. Using probabilistic reasoning , it still makes sense to bet on patterns continuing, mainly because you have no specific information about the alternatives. But you do need to abandon certainty and use probability if ontology can pull the rug from under you.
The ontologcal problem is pretty much equivalent to the problem of the nature of physical law—what makes the future resemble the past? The standard answer , that physical laws are just descriptions, does not work.
Theories of how quarks, electromagnetism and gravity produce planets with intelligent species on them are scientific accomplishments by virtue of the compression they achieve, regardless of why quarks appear to be a thing.
There’s no general agreement on what science is supposed to achieve—specifically, there is an instrumentalism versus realism debate. For realists, it does matter if science fails to discover what’s really real.
If I have two diffrerent data and compress them well among each of them I would not expect those compressions to be similar or the same. Sure time-extraction is nothing special. But there is still a step of appling the model to data it was not formed on. The scientist reverse-engineers what is going on sure, but then he acts like that solution has some relevance to what happens tomorrow, what happens to the place east of known maps, that objects that have not yet been constucted would play to their whims.
In a spatial variant, from multiple posts and crossplanks one could have the idea that there isn a fence which could be really useful in predicing and talking about posts. But then the fence can suddenly come to an end or make an unexpected 90 degree turn. How many posts do you need to see to reasonably conclude that post number #5000 exists?
Sure you could have a scenario where you try to compress previous life experience and when new experience comes in use the old solution as the first guess in compressing the extended whole. Conservation of expected evidence would say that you can’t ever really discredit the possibility of having to genuinely recompress. But it seems atleast the psychological attitude of a guess that has stood for long time would stand for a lot more is easy to fall into. I guess the people that test the theories on weird marginal conditions can appriciate that even if formulation doesn’t change the activity changes the reliability of it far more than just more and more of the same kind of stresses. Representation quality would then be connected to the cultural legacy direction, having the same representation passed down two different paths carries different information content.
For example it wouldn’t be that implausible that if an organism changed its beliefs and interpretations more when it was lowly fed and less when it was well fed after a long grind the representations would be pretty satiating. They wouldn’t prove or justify being satiated just infact cause satiation. It just happens that the property for less surprised is epistemologically interesting.
If I drop two staplers, I can give the same compressed description of the data from their two trajectories: “uniform downward acceleration at close to 9.8 meters per second squared”.
If I found the blueprint for the fence lying around, I’d assign a high probability that the number of fenceposts is what’s shown in the blueprint, minus any that might be knocked over or stolen. Otherwise, I’d start with my priori knowledge of the distribution of sizes of fences, and update according to any observations I make about which reference class of fence this is, and yes, how many posts I’ve encountered so far.
It seems like you haven’t gotten on board with science being a reverse-engineering process that outputs predictive models. But I don’t think this is a controversial point here on LW. Maybe it would help to clarify that a “predictive model” outputs probability distributions over outcomes, not predictions of single forced outcomes?
And if I release two balloons they will have “uniform upward acceleration at close to 9.8 meters per second squared until terminal velocity”. For proper law like things you expect them to hold with no or minimal revision. That it is a compression makes application to new cases complicated. How do you compress something you don’t have access to?
How do you know that a given blue piece of paper is a blueprint for a given fence?
The degree of reasonableness comes from stuff like 5001 post fence and a 4999 post fence being both possible. If induction was rock solid then you would very fast or immidietly believe in an infinite length fence. But induction is unreliable and points in a different direction than just checking whether each post is there. Yet we often find ourself ina situation where we have made some generalization checked, them some but not exhaustively and would like to call our epistemic state as “knowing” the fact.