I’ve read some of this Universal Induction article. It seems to operate from flawed premises.
If we prescribe Occam’s razor principle [3] to select the simplest theory consistent with the training
examples and assume some general bias towards structured environments, one can prove that inductive
learning “works”. These assumptions are an integral part of our scientific method. Whether they admit it
or not, every scientist, and in fact every person, is continuously using this implicit bias towards simplicity
and structure to some degree.
Suppose the brain uses algorithms. An uncontroversial supposition. From a computational point of view, the former citation is like saying: “In order for a computer to not run a program, such as Indiana Jones and the Fate of Atlantis, the computer must be executing some command to the effect of “DoNotExecuteProgram(‘IndianaJonesAndTheFateOfAtlantis’)”.
That’s not how computers operate. They just don’t run the program. They don’t need a special process for not running the program. Instead, not running the program is “implicitly contained” in the state of affairs that the computer is not running it. But this notion of implicit containment makes no sense for the computer. There are infinitely many programs the computer is not running at a given moment, so it can’t process the state of affairs that it is not running any of them.
Likewise, the use of an implicit bias towards simplicity cannot be meaningfully conceptualized by humans. In order to know how this bias simplifies everything, one would have to know, what information regarding “everything” is omitted by the bias. But if we knew that, the bias would not exist in the sense the author intends it to exist.
Furthermore:
This is in some way a contradiction to the well-known no-free-lunch theorems which state that, when averaged over all possible data sets, all learning algorithms perform equally well, and actually, equally poorly [11]. There are several variations of the no-free-lunch theorem for particular contexts but they all rely on the assumption that for a general learner there is no underlying bias to exploit because any observations are equally possible at any point. In other words, any arbitrarily complex environments are just as likely as simple ones, or entirely random data sets are just as likely as structured data. This assumption is misguided and seems absurd when applied to any real world situations. If every raven we have ever seen has been black, does it really seem equally plausible that there is equal chance that the next raven we see will be black, or white, or half black half white, or red etc. In life it is a necessity to make general assumptions about the world and our observation sequences and these assumptions
generally perform well in practice.
The author says that there are variations of the no free lunch theorem for particular contexts. But he goes on to generalize that the notion of no free lunch theorem means something independent of context. What could that possibly be? Also, such notions as “arbitrary complexity” or “randomness” seem intuitively meaningful, but what is their context?
The problem is, if there is no context, the solution cannot be proven to address the problem of induction. But if there is a context, it addresses the problem of induction only within that context. Then philosophers will say that the context was arbitrary, and formulate the problem again in another context where previous results will not apply.
In a way, this makes the problem of induction seem like a waste of time. But the real problem is about formalizing the notion of context in such a way, that it becomes possible to identify ambiguous assumptions about context. That would be what separates scientific thought from poetry. In science, ambiguity is not desired and should therefore be identified. But philosophers tend to place little emphasis on this, and rather spend time dwelling on problems they should, in my opinion, recognize as unsolvable due to ambiguity of context.
The omitted information in this approach is information with a high Kolmogorov complexity, which is omitted in favor of information with low Kolmogorov complexity. A very rough analogy would be to describe humans as having a bias towards ideas expressible in few words of English in favor of ideas that need many words of English to express. Using Kolmogorov complexity for sequence prediction instead of English language for ideas in the construction gets rid of the very many problems of rigor involved in the latter, but the basic idea is pretty much the same. You look into things that are briefly expressible in favor of things that must be expressed in length. The information isn’t permanently omitted, it’s just depriorized. The algorithm doesn’t start looking at the stuff you need long sentences to describe before it has convinced itself that there are no short sentences that describe the observations it wants to explain in a satisfactory way.
One bit of context that is assumed is that the surrounding universe is somewhat amenable to being Kolmogorov-compressed. That is, there are some recurring regularities that you can begin to discover. The term “lawful universe” sometimes thrown around in LW probably refers to something similar.
Solomonoff’s universal induction would not work in a completely chaotic universe, where there are no regularities for Kolmogorov compression to latch on. You’d also be unlikely to find any sort of native intelligent entities in such universes. I’m not sure if this means that the Solomonoff approach is philosophically untenable, but needing to have some discoverable regularities to begin with before discovering regularities with induction becomes possible doesn’t strike me as that great a requirement.
If the problem of context is about exactly where you draw the data for the sequence which you will then try to predict with Solomonoff induction, in a lawless universe you wouldn’t be able to infer things no matter which simple instrumentation you picked, while in a lawful universe you could pick all sorts of instruments, tracking the change of light during time, tracking temperature, tracking the luminousity of the Moon, for simple examples, and you’d start getting Kolmogorov-compressible data where the induction system could start figuring repeating periods.
The core thing “independent of context” in all this is that all the universal induction systems are reduced to basically taking a series of numbers as input, and trying to develop an efficient predictor for what the next number will be. The argument in the paper is that this construction is basically sufficient for all the interesting things an induction solution could do, and that all the various real-world cases where induction is needed can be basically reduced into such a system by describing the instrumentation which turns real-world input into a time series of numbers.
Okay. In this case, the article does seem to begin to make sense. Its connection to the problem of induction is perhaps rather thin. The idea of using low Kolmogorov complexity as justification for an inductive argument cannot be deduced as a theorem of something that’s “surely true”, whatever that might mean. And if it were taken as an axiom, philosophers would say: “That’s not an axiom. That’s the conclusion of an inductive argument you made! You are begging the question!”
However, it seems like advancements in computation theory have made people able to do at least remotely practical stuff on areas, that bear resemblance to more inert philosophical ponderings. That’s good, and this article might even be used as justification for my theory RP—given that the use of Kolmogorov complexity is accepted. I was not familiar with the concept of Kolmogorov complexity despite having heard of it a few times, but my intuitive goal was to minimize the theory’s Kolmogorov complexity by removing arbitrary declarations and favoring symmetry.
I would say, that there are many ways of solving the problem of induction. Whether a theory is a solution to the problem of induction depends on whether it covers the entire scope of the problem. I would say this article covers half of the scope. The rest is not covered, to my knowledge, by anyone else than Robert Pirsig and experts of Buddhism, but these writings are very difficult to approach analytically. Regrettably, I am still unable to publish the relativizability article, which is intended to succeed in the analytic approach.
In any case, even though the widely rejected “statistical relevance” and this “Kolmogorov complexity relevance” share the same flaw, if presented as an explanation of inductive justification, the approach is interesting. Perhaps, even, this paper should be titled: “A Formalization of Occam’s Razor Principle”. Because that’s what it surely seems to be. And I think it’s actually an achievement to formalize that principle—an achievement more than sufficient to justify the writing of the article.
“When artificial intelligence researchers attempted to capture everyday statements of inference using classical logic they began to realize this was a difficult if not impossible task.”
I hope nobody’s doing this anymore. It’s obviously impossible. “Everyday statements of inference”, whatever that might mean, are not exclusively statements of first-order logic, because Russell’s paradox is simple enough to be formulated by talking about barbers. The liar paradox is also expressible with simple, practical language.
Wait a second. Wikipedia already knows this stuff is a formalization of Occam’s razor. One article seems to attribute the formalization of that principle to Solomonoff, another one to Hutter. In addition, Solomonoff induction, that is essential for both, is not computable. Ugh. So Hutter and Rathmanner actually have the nerve to begin that article by talking about the problem of induction, when the goal is obviously to introduce concepts of computation theory? And they are already familiar with Occam’s razor, and aware of it having, at least probably, been formalized?
Okay then, but this doesn’t solve the problem of induction. They have not even formalized the problem of induction in a way that accounts for the logical structure of inductive inference, and leaves room for various relevance operators to take place. Nobody else has done that either, though. I should get back to this later.
“When artificial intelligence researchers attempted to capture everyday statements of inference using classical logic they began to realize this was a difficult if not impossible task.”
I hope nobody’s doing this anymore. It’s obviously impossible. “Everyday statements of inference”, whatever that might mean, are not exclusively statements of first-order logic, because Russell’s paradox is simple enough to be formulated by talking about barbers. The liar paradox is also expressible with simple, practical language.
I’ve read some of this Universal Induction article. It seems to operate from flawed premises.
Suppose the brain uses algorithms. An uncontroversial supposition. From a computational point of view, the former citation is like saying: “In order for a computer to not run a program, such as Indiana Jones and the Fate of Atlantis, the computer must be executing some command to the effect of “DoNotExecuteProgram(‘IndianaJonesAndTheFateOfAtlantis’)”.
That’s not how computers operate. They just don’t run the program. They don’t need a special process for not running the program. Instead, not running the program is “implicitly contained” in the state of affairs that the computer is not running it. But this notion of implicit containment makes no sense for the computer. There are infinitely many programs the computer is not running at a given moment, so it can’t process the state of affairs that it is not running any of them.
Likewise, the use of an implicit bias towards simplicity cannot be meaningfully conceptualized by humans. In order to know how this bias simplifies everything, one would have to know, what information regarding “everything” is omitted by the bias. But if we knew that, the bias would not exist in the sense the author intends it to exist.
Furthermore:
The author says that there are variations of the no free lunch theorem for particular contexts. But he goes on to generalize that the notion of no free lunch theorem means something independent of context. What could that possibly be? Also, such notions as “arbitrary complexity” or “randomness” seem intuitively meaningful, but what is their context?
The problem is, if there is no context, the solution cannot be proven to address the problem of induction. But if there is a context, it addresses the problem of induction only within that context. Then philosophers will say that the context was arbitrary, and formulate the problem again in another context where previous results will not apply.
In a way, this makes the problem of induction seem like a waste of time. But the real problem is about formalizing the notion of context in such a way, that it becomes possible to identify ambiguous assumptions about context. That would be what separates scientific thought from poetry. In science, ambiguity is not desired and should therefore be identified. But philosophers tend to place little emphasis on this, and rather spend time dwelling on problems they should, in my opinion, recognize as unsolvable due to ambiguity of context.
The omitted information in this approach is information with a high Kolmogorov complexity, which is omitted in favor of information with low Kolmogorov complexity. A very rough analogy would be to describe humans as having a bias towards ideas expressible in few words of English in favor of ideas that need many words of English to express. Using Kolmogorov complexity for sequence prediction instead of English language for ideas in the construction gets rid of the very many problems of rigor involved in the latter, but the basic idea is pretty much the same. You look into things that are briefly expressible in favor of things that must be expressed in length. The information isn’t permanently omitted, it’s just depriorized. The algorithm doesn’t start looking at the stuff you need long sentences to describe before it has convinced itself that there are no short sentences that describe the observations it wants to explain in a satisfactory way.
One bit of context that is assumed is that the surrounding universe is somewhat amenable to being Kolmogorov-compressed. That is, there are some recurring regularities that you can begin to discover. The term “lawful universe” sometimes thrown around in LW probably refers to something similar.
Solomonoff’s universal induction would not work in a completely chaotic universe, where there are no regularities for Kolmogorov compression to latch on. You’d also be unlikely to find any sort of native intelligent entities in such universes. I’m not sure if this means that the Solomonoff approach is philosophically untenable, but needing to have some discoverable regularities to begin with before discovering regularities with induction becomes possible doesn’t strike me as that great a requirement.
If the problem of context is about exactly where you draw the data for the sequence which you will then try to predict with Solomonoff induction, in a lawless universe you wouldn’t be able to infer things no matter which simple instrumentation you picked, while in a lawful universe you could pick all sorts of instruments, tracking the change of light during time, tracking temperature, tracking the luminousity of the Moon, for simple examples, and you’d start getting Kolmogorov-compressible data where the induction system could start figuring repeating periods.
The core thing “independent of context” in all this is that all the universal induction systems are reduced to basically taking a series of numbers as input, and trying to develop an efficient predictor for what the next number will be. The argument in the paper is that this construction is basically sufficient for all the interesting things an induction solution could do, and that all the various real-world cases where induction is needed can be basically reduced into such a system by describing the instrumentation which turns real-world input into a time series of numbers.
Okay. In this case, the article does seem to begin to make sense. Its connection to the problem of induction is perhaps rather thin. The idea of using low Kolmogorov complexity as justification for an inductive argument cannot be deduced as a theorem of something that’s “surely true”, whatever that might mean. And if it were taken as an axiom, philosophers would say: “That’s not an axiom. That’s the conclusion of an inductive argument you made! You are begging the question!”
However, it seems like advancements in computation theory have made people able to do at least remotely practical stuff on areas, that bear resemblance to more inert philosophical ponderings. That’s good, and this article might even be used as justification for my theory RP—given that the use of Kolmogorov complexity is accepted. I was not familiar with the concept of Kolmogorov complexity despite having heard of it a few times, but my intuitive goal was to minimize the theory’s Kolmogorov complexity by removing arbitrary declarations and favoring symmetry.
I would say, that there are many ways of solving the problem of induction. Whether a theory is a solution to the problem of induction depends on whether it covers the entire scope of the problem. I would say this article covers half of the scope. The rest is not covered, to my knowledge, by anyone else than Robert Pirsig and experts of Buddhism, but these writings are very difficult to approach analytically. Regrettably, I am still unable to publish the relativizability article, which is intended to succeed in the analytic approach.
In any case, even though the widely rejected “statistical relevance” and this “Kolmogorov complexity relevance” share the same flaw, if presented as an explanation of inductive justification, the approach is interesting. Perhaps, even, this paper should be titled: “A Formalization of Occam’s Razor Principle”. Because that’s what it surely seems to be. And I think it’s actually an achievement to formalize that principle—an achievement more than sufficient to justify the writing of the article.
Commenting the article:
“When artificial intelligence researchers attempted to capture everyday statements of inference using classical logic they began to realize this was a difficult if not impossible task.”
I hope nobody’s doing this anymore. It’s obviously impossible. “Everyday statements of inference”, whatever that might mean, are not exclusively statements of first-order logic, because Russell’s paradox is simple enough to be formulated by talking about barbers. The liar paradox is also expressible with simple, practical language.
Wait a second. Wikipedia already knows this stuff is a formalization of Occam’s razor. One article seems to attribute the formalization of that principle to Solomonoff, another one to Hutter. In addition, Solomonoff induction, that is essential for both, is not computable. Ugh. So Hutter and Rathmanner actually have the nerve to begin that article by talking about the problem of induction, when the goal is obviously to introduce concepts of computation theory? And they are already familiar with Occam’s razor, and aware of it having, at least probably, been formalized?
Okay then, but this doesn’t solve the problem of induction. They have not even formalized the problem of induction in a way that accounts for the logical structure of inductive inference, and leaves room for various relevance operators to take place. Nobody else has done that either, though. I should get back to this later.
Commenting the article:
“When artificial intelligence researchers attempted to capture everyday statements of inference using classical logic they began to realize this was a difficult if not impossible task.”
I hope nobody’s doing this anymore. It’s obviously impossible. “Everyday statements of inference”, whatever that might mean, are not exclusively statements of first-order logic, because Russell’s paradox is simple enough to be formulated by talking about barbers. The liar paradox is also expressible with simple, practical language.