Well solomonoff induction and systems like AIXI are bigger than Bayes as they use it as a part of themselves. They are intractable.
And I’d guess there’s a link between those and rationality. Epistemical and instrumental rationality respectively, pushed to their theoretical limits of optimality.
And induction is a special case of deduction, since probability theory itself is a logic with theorems: what a given prior updates to, on given evidence, is a deductive mathematical fact.
Tim--- To resolve your disagreement: Induction is not purely about deduction, but it nevertheless can be completely modelled by a deductive system.
More specifically, I agree with your claim about induction (see point 4 above). However, in defense of Eliezer’s claim that induction is a special case of deduction, I think you can model it in a deductive system even though induction might require additional assumptions. For one thing, deduction in practice seems to me to require empirical assumptions as well (i.e., the “axioms” and “inference rules” are chosen based on how right they seem), so the fact that induction needs some axioms should not itself prevent deductive style proofs using an appropriately formalized version of it. So, once one decides on various axioms, such as the various desiderata I list above for a Solomonoff-like system, you CAN describe via a mathematical deduction system how the process of induction would proceed. So, induction can be formalized and proofs can be made about the best thing for an agent to do; the AIXI model is basically an example of this.
If that is a defense of induction being a special case of deduction, then it’s a defense of anything being a special case of deduction—since logic can model anything.
The golden gate bridge is a special case of deduction, in this sense.
I am not impressed by the idea that induction is a special case of deduction—I would describe it as being wrong. You need extra axioms for induction. It is not the same thing at all.
Induction tells us whether something is probable; based on past experience we can make a prediction about the future. But to apply induction to decide something is a deduction:
First, make the assumption that induction can be applied to infer truth. Then, apply induction. The result is a valid conclusion deduced using (1) induction and (2) the belief that you can use induction.
To recap… induction is not a purely deductive principle—since it relies on an axiom known as “The Principle of Uniformity of Nature”—http://en.wikipedia.org/wiki/Principle_of_uniformity which states that the laws of physics are the same from place to place and that the past is a useful guide to the future.
That axiom is not available as a result of any deduction—and attempts to justify it always seem to be circular—i.e. they use induction.
It looks to me like those uniformity of nature principles would be nice but that induction could still be a smart thing to do despite non-uniformity. We’d need to specify in what sense uniformity was broken to distinguish when induction still holds.
Right. We only assume uniformity for the same reason we assume all emeralds are green and not bleen. It’s just the simpler hypothesis. If we had reason to think that the laws of physics alternated like a checkerboard, or that colors magically changed in 2012, then we’d just have to take that into account.
This reminds me of the Feynman quote
“Philosophers say a great deal about what is absolutely necessary for science, and it is always, so far as one can see, rather naive, and probably wrong.”
I agree with Jimmy’s examples. Tim, the Solomonoff model may have some other fine print assumptions {see some analysis by Shane Legg here}, but “the earth having the same laws as space” or “laws not varying with time” are definitely not needed for the optimality proofs of the universal prior (though of course, to your point, uniformity does make our induction in practice easier, and time and space translation invariance of physical law do appear to be true, AFAIK.). Basically, assuming the universe is computable is enough to get the optimality guarantees. This doesn’t mean you might not still be wrong if Mars in empirical fact changes the rules you’ve learned on Earth, but it still provides a strong justification for using induction even if you were not guaranteed that the laws were the same, until you observed Mars to have different laws, at which point, you would assign largest weight to the simplest joint hypothesis for your next decision.
I’m afraid that you’re assuming what you’re trying to prove: whether you call it uniformity, or simplicity, or order, it’s all the same assumption, and you do have to assume it, whatever Feynman says.
Look at it from a Bayesian point of view: if your prior for the universe is that every sequence of Universe-states is equally likely, then the apparent order of the states so far gives no weight at all to more orderly future states—in fact, no observation can change what we expect.
Incidentally I’m very confident of the math in the paragraph above, and I’d ask that you’d be sure you’ve taken in what I’m getting at there in your reply.
There are a lot more complex than simple possible universes, so the assumption that an individual simple possible universe is more probable than an individual complex possible universe (which is the assumption being made here) is not the same thing as the assumption that all simple universes considered together are more probable than all complex universes considered together (i.e., the assumption that the universe is probably simple). (Not saying you disagree, but it’s probably good to be careful around the distinction.)
I suspect I’m going to be trying to make this point again at some point—I’ve had difficulty in the past explaining the problem of induction, and though I know about Solomonoff induction I only realised today that the whole problem is all about priors. I tried to to be explicit about which side of the distinction you draw I was speaking, but any thoughts on how I can make it clearer in future? Thanks!
Ciphergoth, I agree your points, that if your prior over world-states were not induction biased to start with, you would not be able to reliably use induction, and that this is a type of circularity. Also of course, the universe might just be such that the Occam prior doesn’t make you win; there is no free lunch, after all.
But I still think induction could meaningfully justify itself, at least in a partial sense. One possible, though speculative, pathway: Suppose Tegmark is right and all possible math structures exist, and that some of these contain conscious sub-structures, such as you. Suppose further that Bostrom is right and observers can be counted to constrain empirical predictions. Then it might be that there are more beings in your reference class that are part of simple mathematical structures as opposed to complex mathematical structures, possibly as a result of some mathematical fact about your structure and how that logically inter-relates to all possible structures. This might actually make something like induction true about the universe, without it needing to be a direct assumption. I personally don’t know if this will turn out to be true, nor whether it is provable even if true, but this would seem to me to be a deep, though still partially circular, justification for induction, if it is the case.
We’re not fully out of the woods even if all of this is true, because one still might want to ask Tegmark “Why does literally everything exist rather than something else?”, to which he might want to point to an Occam-like argument that “Everything exists” is algorithmically very simple. But these, while circularities, do not appear trivial to my mind; i.e., they are still deep and arguably meaningful connections which seem to lend credence to the whole edifice. Eli discusses in great detail why some circular loops like these might be ok/necessary to use in Where Recursive Justification Hits Bottom
To a Bayesian, the problem of induction comes down to justifying your priors. If your priors rate an orderly universe as no more likely than a disorderly one, than all the evidence of regularity in the past is no reason to expect regularity in the future—all futures are still equally likely. Only with a prior that weights more orderly universes with a higher probability, as Solomonoff’s universal prior does, will you be able to use the past to make predictions.
More than that, surely: inductive inference is also built into Bayes’ theorem itself.
Unless the past is useful as a guide to the future, the whole concept of maintaining a model of the world and updating it when new evidence arrives becomes worthless.
inductive inference is also built into Bayes’ theorem itself
As you say, Bayes’ theroem isn’t useful if you start from a “flat” prior; all posterior probabilities come out the same as prior probabilities, at least if A is in the future and B in the past. But nothing in Bayes’ theorem itself says that it has to be useful.
Right. In case anyone thinks this thread is an argument, it’s not—the assumption of induction would need to be added to deduce anything about the empirical world. The definition above didn’t say how deductions would be made… You just make assumptions and then keep track of what your conclusions would be given those assumptions (that’s deduction). I’m not sure if we could or would start listing the assumptions. I made the mistake of including (1), which is the only explicit assumption, but AndySimpson and ALexU have pointed out that elevating that assumption is empiricism.
I think we’re probably using some words differently, and that’s making you think my claim that deductive reasoning is a special case of Bayes is stronger than I mean it to be.
All I mean, approximately, is:
Bayes theorem: p(B|A) = p(A|B)*p(B) / p(A)
Deduction : Consider a deductive system to be a set of axioms and inference rules. Each inference rule says: “with such and such things proven already, you can then conclude such and such”. And deduction in general then consists of recursively turning the crank of the inference rules on the axioms and already generated results over and over to conclude everything you can.
Think of each inference rule “i” as i(A) = B, where A is some set of already established statements and B corresponds to what statements “i” let’s you conclude, if you already have A.
Then, by deduction we’re just trying to say that if we have generated A, and we have an inference rule i(A) = B, then we can generate or conclude B.
The connection between deduction and Baye’s is to take the generated “proofs” of the deductive system as those things to which you assign probability of 1 using Bayes.
So, the inference rule corresponds to the fact that p(B | A) = 1. The fact that A has been already generated corresponds to p(A) = 1. Also, since A has already been generated independently of B, p(A | B) = 1, since A didn’t need B to be generated. And we want to know what p(B) is.
Well, plugging into Bayes: p(B|A) = p(A|B)p(B) / p(A)
i.e. 1 = 1 p(B) / 1
i.e. p(B) = 1.
In other words, B can be generated, which is what we wanted to show.
So basically, I think of deductive reasoning as just reasoning with no uncertainty, and I see that as popping out of bayes in the limiting case. If a certain formal interpretation of this leads me into Godelian problems, then I would just need to weaken my claim somewhat, because some useful analogy is clearly there in how the uncertain reasoning of Bayes reduces to certain conclusions in various limits of the inputs (p=0, p=1, etc.).
I think I would describe what you are talking about as being Bayesian statistics—plus a whole bunch of unspecified rules (the “i” s).
What I was saying is that there isn’t a standard set of rules of deductive reasoning axioms that is considered to be part of Bayesian statistics. I would not dispute that you can model deductive reasoning using Bayesian statistics.
Rationality is surely bigger than Bayes—since it incudes deductive reasoning.
Well solomonoff induction and systems like AIXI are bigger than Bayes as they use it as a part of themselves. They are intractable.
And I’d guess there’s a link between those and rationality. Epistemical and instrumental rationality respectively, pushed to their theoretical limits of optimality.
this can be viewed the other way around, deductive reasoning as a special case of Bayes
Exactly: the special case where the conditional probabilities are (practically) 0 or 1.
yes, exactly
And induction is a special case of deduction, since probability theory itself is a logic with theorems: what a given prior updates to, on given evidence, is a deductive mathematical fact.
Besides, I’m informed that I just use duction.
No, see:
http://en.wikipedia.org/wiki/Problem_of_induction
Tim--- To resolve your disagreement: Induction is not purely about deduction, but it nevertheless can be completely modelled by a deductive system.
More specifically, I agree with your claim about induction (see point 4 above). However, in defense of Eliezer’s claim that induction is a special case of deduction, I think you can model it in a deductive system even though induction might require additional assumptions. For one thing, deduction in practice seems to me to require empirical assumptions as well (i.e., the “axioms” and “inference rules” are chosen based on how right they seem), so the fact that induction needs some axioms should not itself prevent deductive style proofs using an appropriately formalized version of it. So, once one decides on various axioms, such as the various desiderata I list above for a Solomonoff-like system, you CAN describe via a mathematical deduction system how the process of induction would proceed. So, induction can be formalized and proofs can be made about the best thing for an agent to do; the AIXI model is basically an example of this.
If that is a defense of induction being a special case of deduction, then it’s a defense of anything being a special case of deduction—since logic can model anything.
The golden gate bridge is a special case of deduction, in this sense.
I am not impressed by the idea that induction is a special case of deduction—I would describe it as being wrong. You need extra axioms for induction. It is not the same thing at all.
Yes, the golden gate bridge is a special case of deduction in the sense meant here. I have no problem with anything in your comment, I think we agree.
Induction tells us whether something is probable; based on past experience we can make a prediction about the future. But to apply induction to decide something is a deduction:
First, make the assumption that induction can be applied to infer truth. Then, apply induction. The result is a valid conclusion deduced using (1) induction and (2) the belief that you can use induction.
To recap… induction is not a purely deductive principle—since it relies on an axiom known as “The Principle of Uniformity of Nature”—http://en.wikipedia.org/wiki/Principle_of_uniformity which states that the laws of physics are the same from place to place and that the past is a useful guide to the future.
That axiom is not available as a result of any deduction—and attempts to justify it always seem to be circular—i.e. they use induction.
According to http://en.wikipedia.org/wiki/Problem_of_induction#Ancient_origins this problem has been known about for over 2,000 years.
It looks to me like those uniformity of nature principles would be nice but that induction could still be a smart thing to do despite non-uniformity. We’d need to specify in what sense uniformity was broken to distinguish when induction still holds.
Right. We only assume uniformity for the same reason we assume all emeralds are green and not bleen. It’s just the simpler hypothesis. If we had reason to think that the laws of physics alternated like a checkerboard, or that colors magically changed in 2012, then we’d just have to take that into account.
This reminds me of the Feynman quote “Philosophers say a great deal about what is absolutely necessary for science, and it is always, so far as one can see, rather naive, and probably wrong.”
I agree with Jimmy’s examples. Tim, the Solomonoff model may have some other fine print assumptions {see some analysis by Shane Legg here}, but “the earth having the same laws as space” or “laws not varying with time” are definitely not needed for the optimality proofs of the universal prior (though of course, to your point, uniformity does make our induction in practice easier, and time and space translation invariance of physical law do appear to be true, AFAIK.). Basically, assuming the universe is computable is enough to get the optimality guarantees. This doesn’t mean you might not still be wrong if Mars in empirical fact changes the rules you’ve learned on Earth, but it still provides a strong justification for using induction even if you were not guaranteed that the laws were the same, until you observed Mars to have different laws, at which point, you would assign largest weight to the simplest joint hypothesis for your next decision.
I’m afraid that you’re assuming what you’re trying to prove: whether you call it uniformity, or simplicity, or order, it’s all the same assumption, and you do have to assume it, whatever Feynman says.
Look at it from a Bayesian point of view: if your prior for the universe is that every sequence of Universe-states is equally likely, then the apparent order of the states so far gives no weight at all to more orderly future states—in fact, no observation can change what we expect.
Incidentally I’m very confident of the math in the paragraph above, and I’d ask that you’d be sure you’ve taken in what I’m getting at there in your reply.
There are a lot more complex than simple possible universes, so the assumption that an individual simple possible universe is more probable than an individual complex possible universe (which is the assumption being made here) is not the same thing as the assumption that all simple universes considered together are more probable than all complex universes considered together (i.e., the assumption that the universe is probably simple). (Not saying you disagree, but it’s probably good to be careful around the distinction.)
I suspect I’m going to be trying to make this point again at some point—I’ve had difficulty in the past explaining the problem of induction, and though I know about Solomonoff induction I only realised today that the whole problem is all about priors. I tried to to be explicit about which side of the distinction you draw I was speaking, but any thoughts on how I can make it clearer in future? Thanks!
Ciphergoth, I agree your points, that if your prior over world-states were not induction biased to start with, you would not be able to reliably use induction, and that this is a type of circularity. Also of course, the universe might just be such that the Occam prior doesn’t make you win; there is no free lunch, after all.
But I still think induction could meaningfully justify itself, at least in a partial sense. One possible, though speculative, pathway: Suppose Tegmark is right and all possible math structures exist, and that some of these contain conscious sub-structures, such as you. Suppose further that Bostrom is right and observers can be counted to constrain empirical predictions. Then it might be that there are more beings in your reference class that are part of simple mathematical structures as opposed to complex mathematical structures, possibly as a result of some mathematical fact about your structure and how that logically inter-relates to all possible structures. This might actually make something like induction true about the universe, without it needing to be a direct assumption. I personally don’t know if this will turn out to be true, nor whether it is provable even if true, but this would seem to me to be a deep, though still partially circular, justification for induction, if it is the case.
We’re not fully out of the woods even if all of this is true, because one still might want to ask Tegmark “Why does literally everything exist rather than something else?”, to which he might want to point to an Occam-like argument that “Everything exists” is algorithmically very simple. But these, while circularities, do not appear trivial to my mind; i.e., they are still deep and arguably meaningful connections which seem to lend credence to the whole edifice. Eli discusses in great detail why some circular loops like these might be ok/necessary to use in Where Recursive Justification Hits Bottom
To a Bayesian, the problem of induction comes down to justifying your priors. If your priors rate an orderly universe as no more likely than a disorderly one, than all the evidence of regularity in the past is no reason to expect regularity in the future—all futures are still equally likely. Only with a prior that weights more orderly universes with a higher probability, as Solomonoff’s universal prior does, will you be able to use the past to make predictions.
More than that, surely: inductive inference is also built into Bayes’ theorem itself.
Unless the past is useful as a guide to the future, the whole concept of maintaining a model of the world and updating it when new evidence arrives becomes worthless.
As you say, Bayes’ theroem isn’t useful if you start from a “flat” prior; all posterior probabilities come out the same as prior probabilities, at least if A is in the future and B in the past. But nothing in Bayes’ theorem itself says that it has to be useful.
Right. In case anyone thinks this thread is an argument, it’s not—the assumption of induction would need to be added to deduce anything about the empirical world. The definition above didn’t say how deductions would be made… You just make assumptions and then keep track of what your conclusions would be given those assumptions (that’s deduction). I’m not sure if we could or would start listing the assumptions. I made the mistake of including (1), which is the only explicit assumption, but AndySimpson and ALexU have pointed out that elevating that assumption is empiricism.
agreed, drawing hands
By “Bayes” I meant this: http://en.wikipedia.org/wiki/Bayes’_theorem—a formalisation of induction.
If you think “Bayes” somehow includes deductive reasoning, can you explain whether it supposedly encapsulates first-order logic or second-order logic?
I think we’re probably using some words differently, and that’s making you think my claim that deductive reasoning is a special case of Bayes is stronger than I mean it to be.
All I mean, approximately, is:
Bayes theorem: p(B|A) = p(A|B)*p(B) / p(A)
Deduction : Consider a deductive system to be a set of axioms and inference rules. Each inference rule says: “with such and such things proven already, you can then conclude such and such”. And deduction in general then consists of recursively turning the crank of the inference rules on the axioms and already generated results over and over to conclude everything you can.
Think of each inference rule “i” as i(A) = B, where A is some set of already established statements and B corresponds to what statements “i” let’s you conclude, if you already have A.
Then, by deduction we’re just trying to say that if we have generated A, and we have an inference rule i(A) = B, then we can generate or conclude B.
The connection between deduction and Baye’s is to take the generated “proofs” of the deductive system as those things to which you assign probability of 1 using Bayes.
So, the inference rule corresponds to the fact that p(B | A) = 1. The fact that A has been already generated corresponds to p(A) = 1. Also, since A has already been generated independently of B, p(A | B) = 1, since A didn’t need B to be generated. And we want to know what p(B) is.
Well, plugging into Bayes:
p(B|A) = p(A|B)p(B) / p(A) i.e. 1 = 1 p(B) / 1 i.e. p(B) = 1.
In other words, B can be generated, which is what we wanted to show.
So basically, I think of deductive reasoning as just reasoning with no uncertainty, and I see that as popping out of bayes in the limiting case. If a certain formal interpretation of this leads me into Godelian problems, then I would just need to weaken my claim somewhat, because some useful analogy is clearly there in how the uncertain reasoning of Bayes reduces to certain conclusions in various limits of the inputs (p=0, p=1, etc.).
I think I would describe what you are talking about as being Bayesian statistics—plus a whole bunch of unspecified rules (the “i” s).
What I was saying is that there isn’t a standard set of rules of deductive reasoning axioms that is considered to be part of Bayesian statistics. I would not dispute that you can model deductive reasoning using Bayesian statistics.
Tim-Good, your distinction sounds correct to me.