Science is built around the assumption that you’re too stupid and self-deceiving to just use Solomonoff induction.
We can’t use Solomonoff induction—because it is uncomputable.
We don’t have any good quality computable approximations to it either. That is indeed because we are too stupid. That is more fact than assumption, though.
We can’t use Solomonoff induction—because it is uncomputable.
Generating hypotheses is uncomputable. However, once you have a candidate hypothesis, if it explains the observations you can do a computation to verify that, and you can always measure its complexity. So you’ll never know that you have the best hypothesis, but you can compare hypotheses for quality.
I’d really like to know if there’s anything to be known about the nature of the suboptimal predictions you’ll make if you use suboptimal hypotheses, since we’re pretty much certain to be using suboptimal hypotheses.
We can’t use Solomonoff induction—because it is uncomputable.
What do you mean by this? Surely not that it is uncomputable in the theoretic sense? Minimum Description Length and Kolmogorov Complexity are most definitely computable for a wide range of problems and are highly used for model selection, especially model order selection, in pattern theory, decision theory, and machine learning. These are equivalent to Solomonoff Induction. Is there some special case problem to which you refer? These things may not be computable in some cases, but they are generally computable.
Yes, you can’t compute the specific value of K. But I’m asking why anyone thinks that is relevant to the use of Kolmogorov complexity for distinguishing hypotheses. If someone specifies the Mandlebrot set graphically, and I specify it with a short algorithm, then my explanation for the set wins based on minimum description length. Maybe my algorithm for it doesn’t achieve the “true” value of K for the Mandlebrot set, but why does this matter? When I say that these things are computable for a wide range of problems, I don’t mean that they violate the equivalence with the halting problem. I mean that you can compare hypotheses and distinguish them by noting which has smaller description length. The example that comes to mind is the classic example of a Gaussian mixture model. How many components should it have? It’s a trade-off between having a small number of components and being able to match the data. If several hypotheses can all match the data equally well, then the one with the fewest components wins. It is in this sense that the OP above mentions how Science doesn’t trust us to use Solomonoff induction.
It doesn’t mean anything to say something like: “We can’t use Solomonoff induction—because it is uncomputable.” To me, this is like saying we can’t distinguish between which algorithm for a given problem is more complex because we don’t know if P = NP, or something along these lines. The practical use of Solomonoff induction (i.e. doing computations with it) has little to do with its non-computability in the theoretical sense. See Bayesian information criterion.
I disagree that the statement of that fact is relevant to the OP above. Further, it’s not a true fact. We can use Solomonoff induction despite the fact that it’s not computable. Even just its mere conceptual analysis is useful for driving the understanding of approximations. I’m not critical of the noncomputability part of the statement; I am critical of the logical leap in going from its noncomputability to the claim that we can’t use it.
I recommend distinguishing between “using Solomonoff induction” and “using the idea of Solomonoff induction”. We can do the latter, but not the former.
If you want to talk about “computable approximations to Solomonoff induction” it is probably best to spell that concept out as well—or else use a term like “general-purpose forecaster”.
In a general discussion about Solomonoff induction, I’d agree with you. But based upon Eli’s claim in the OP:
Science is built around the assumption that you’re too stupid and self-deceiving to just use Solomonoff induction. After all, if it was that simple, we wouldn’t need a social process of science… right?
I don’t think it’s reasonable to bring in the noncomputability of Solomonoff induction into this. That has nothing to do with why we build things around the assumption that individuals won’t correctly utilize Occam’s razor. It’s a curmudgeony nag to critique this statement by dragging computability theory into this and doesn’t address the spirit of the arguments about Science. If you have Hypothesis A and Hypothesis B and B is simpler and fits the data just as well (all else being equal), then (by appeal to Solomonoff induction) it’s rational to choose B. But here in Science we don’t see this. We cling to old theories out of stigma, like clinging to Copenhagen and requiring that MW bring forth overwhelming new experiments to refute Copenhagen.
The point of the OP is that yes, we need to distrust that others will correctly put aside biases and use Occam’s razor correctly when distinguishing hypotheses. However, one can get carried away with this and transform it into a sort of bias itself; one in which demonstrably reasonable arguments are not paid attention to simply because they are new or because they address interpretations of prior results rather than presenting new experiments that visibly distinguish between themselves and the old hypotheses.
The tendency to cling to old hypotheses is not grounded in the accusation that Occam’s razor (Solomonoff induction) is itself a deficient way to view the problems. It’s grounded in what might be called scientific inertia. If science is in motion in the direction of Theory A, there’s an irrational sluggishness in suddenly jumping ship to Theory B. Solomonoff induction or Occam’s razor or Kolmogorov complexity or Minimim Description Length are attempts to be truly optimal about jumping ship between ideas. You shouldn’t easily accept any reasonable sounding argument. But you shouldn’t dismiss reasonable sounding arguments just because a famous physicist didn’t invent them 30 years ago and they have subsequently enjoyed 30 years of seeming to be the best explanation. Science with a capital S tends to err in the latter sense while individuals tend to err in the former sense. Bayesianism is a framework to attempt not to err at all, and certainly not to err in a systematic and easily detectable way.
I just don’t see how these fancy pants distinctions about Solomonoff induction bear relevance on that issue, which is the issue set forth in the OP.
We can’t use Solomonoff induction—because it is uncomputable.
The “scientific assumption” that we are too stupid to use a sophisticated approximation of Solomonoff induction is more like a true fact: we can’t do that—and that is essentially because we are too stupid to know how to do it.
The idea of “Science” as incapable of using Occam’s razor seems like a bit of a straw man. I learned to use Occam’s razor by studying science. Distinguishing between scientific theories is listed as the first application of the razor here.
In my experience, it is rare that someone has a legitimate background in the appropriate use of Occam’s razor. I do not agree that this is a straw man. I think you’re also conflating two issues. I see it as an issue that individuals are not able to overcome biases well enough to reliably use Occam’s razor when promoting solutions to problems. The scientific community as a whole is much more successful in doing this, and no one (neither me nor the OP) disagrees. But an alternate issue arises which is that the scientific community tends to simply fail to evaluate whether or not a proposed theory wins (in the Occam’s razor sense) unless there is a tremendous stack of easily visible experimental evidence to motivate such an evaluation. This is a major reason why single-world views have persisted for so long. Few cling to single-world views because they are “favorite pet theories” (which would classify such an error into the appropriate-use-of-Occam’s-razor type). More often it is just that alternative explanations will simply not even be considered just because they don’t have the temporally aggregated endorsement of the scientific community.
If Eliezer walked up to Sir Roger Penrose and presented a great argument about why the explanation of consciousness due to quantum gravity was just a mysterious answer to a mysterious question, and Penrose replied with something like, “Come back and talk to me when you’ve got 20 years worth of experimental evidence on your side… I don’t want to hear about your retro-active interpretations… it’s not worth my time if there’s not a mountain of evidence to persuade me to update to any new position”, this would be the type of mistake that the OP is trying to point out. And as a grad student at an R-1 university, I can tell you this is anything but a straw man. People go around not updating their maps all the time and their reasoning is just that until some new interpretation is overwhelmingly salient in terms of a flurry of brand new experimental insights, they just won’t even consider that it exists. That’s a serious problem from a Bayesian perspective. And as the turnaround time for scientific results shortens, those willing to update sooner will have a distinct advantage.
Finally, I do not understand how you can say that “We can’t use Solomonoff induction—because it is uncomputable” is a “criticism” with respect to the ideas in the OP. The OP has absolutely nothing to do with the computability of Solomonoff induction. We can use it in the sense that you mentioned when you said:
Distinguishing between scientific theories is listed as the first application of the razor here.
That’s great that it’s listed there, much as it has been repeatedly listed and emphasized in major discussions for the last 30 years. But many factors are preventing that from trickling down to the work of actual scientists.
We can’t use Solomonoff induction—because it is uncomputable.
We don’t have any good quality computable approximations to it either. That is indeed because we are too stupid. That is more fact than assumption, though.
Generating hypotheses is uncomputable. However, once you have a candidate hypothesis, if it explains the observations you can do a computation to verify that, and you can always measure its complexity. So you’ll never know that you have the best hypothesis, but you can compare hypotheses for quality.
I’d really like to know if there’s anything to be known about the nature of the suboptimal predictions you’ll make if you use suboptimal hypotheses, since we’re pretty much certain to be using suboptimal hypotheses.
What do you mean by this? Surely not that it is uncomputable in the theoretic sense? Minimum Description Length and Kolmogorov Complexity are most definitely computable for a wide range of problems and are highly used for model selection, especially model order selection, in pattern theory, decision theory, and machine learning. These are equivalent to Solomonoff Induction. Is there some special case problem to which you refer? These things may not be computable in some cases, but they are generally computable.
Solomonoff induction is uncomputable. So is Kolmogorov complexity. Actually it’s equivalent to the halting problem. Could you give some references for your claims?
Yes, you can’t compute the specific value of K. But I’m asking why anyone thinks that is relevant to the use of Kolmogorov complexity for distinguishing hypotheses. If someone specifies the Mandlebrot set graphically, and I specify it with a short algorithm, then my explanation for the set wins based on minimum description length. Maybe my algorithm for it doesn’t achieve the “true” value of K for the Mandlebrot set, but why does this matter? When I say that these things are computable for a wide range of problems, I don’t mean that they violate the equivalence with the halting problem. I mean that you can compare hypotheses and distinguish them by noting which has smaller description length. The example that comes to mind is the classic example of a Gaussian mixture model. How many components should it have? It’s a trade-off between having a small number of components and being able to match the data. If several hypotheses can all match the data equally well, then the one with the fewest components wins. It is in this sense that the OP above mentions how Science doesn’t trust us to use Solomonoff induction.
It doesn’t mean anything to say something like: “We can’t use Solomonoff induction—because it is uncomputable.” To me, this is like saying we can’t distinguish between which algorithm for a given problem is more complex because we don’t know if P = NP, or something along these lines. The practical use of Solomonoff induction (i.e. doing computations with it) has little to do with its non-computability in the theoretical sense. See Bayesian information criterion.
Sure it does: that was just stating a true fact.
I disagree that the statement of that fact is relevant to the OP above. Further, it’s not a true fact. We can use Solomonoff induction despite the fact that it’s not computable. Even just its mere conceptual analysis is useful for driving the understanding of approximations. I’m not critical of the noncomputability part of the statement; I am critical of the logical leap in going from its noncomputability to the claim that we can’t use it.
I recommend distinguishing between “using Solomonoff induction” and “using the idea of Solomonoff induction”. We can do the latter, but not the former.
If you want to talk about “computable approximations to Solomonoff induction” it is probably best to spell that concept out as well—or else use a term like “general-purpose forecaster”.
In a general discussion about Solomonoff induction, I’d agree with you. But based upon Eli’s claim in the OP:
I don’t think it’s reasonable to bring in the noncomputability of Solomonoff induction into this. That has nothing to do with why we build things around the assumption that individuals won’t correctly utilize Occam’s razor. It’s a curmudgeony nag to critique this statement by dragging computability theory into this and doesn’t address the spirit of the arguments about Science. If you have Hypothesis A and Hypothesis B and B is simpler and fits the data just as well (all else being equal), then (by appeal to Solomonoff induction) it’s rational to choose B. But here in Science we don’t see this. We cling to old theories out of stigma, like clinging to Copenhagen and requiring that MW bring forth overwhelming new experiments to refute Copenhagen.
The point of the OP is that yes, we need to distrust that others will correctly put aside biases and use Occam’s razor correctly when distinguishing hypotheses. However, one can get carried away with this and transform it into a sort of bias itself; one in which demonstrably reasonable arguments are not paid attention to simply because they are new or because they address interpretations of prior results rather than presenting new experiments that visibly distinguish between themselves and the old hypotheses.
The tendency to cling to old hypotheses is not grounded in the accusation that Occam’s razor (Solomonoff induction) is itself a deficient way to view the problems. It’s grounded in what might be called scientific inertia. If science is in motion in the direction of Theory A, there’s an irrational sluggishness in suddenly jumping ship to Theory B. Solomonoff induction or Occam’s razor or Kolmogorov complexity or Minimim Description Length are attempts to be truly optimal about jumping ship between ideas. You shouldn’t easily accept any reasonable sounding argument. But you shouldn’t dismiss reasonable sounding arguments just because a famous physicist didn’t invent them 30 years ago and they have subsequently enjoyed 30 years of seeming to be the best explanation. Science with a capital S tends to err in the latter sense while individuals tend to err in the former sense. Bayesianism is a framework to attempt not to err at all, and certainly not to err in a systematic and easily detectable way.
I just don’t see how these fancy pants distinctions about Solomonoff induction bear relevance on that issue, which is the issue set forth in the OP.
There were two criticisms:
We can’t use Solomonoff induction—because it is uncomputable.
The “scientific assumption” that we are too stupid to use a sophisticated approximation of Solomonoff induction is more like a true fact: we can’t do that—and that is essentially because we are too stupid to know how to do it.
The idea of “Science” as incapable of using Occam’s razor seems like a bit of a straw man. I learned to use Occam’s razor by studying science. Distinguishing between scientific theories is listed as the first application of the razor here.
In my experience, it is rare that someone has a legitimate background in the appropriate use of Occam’s razor. I do not agree that this is a straw man. I think you’re also conflating two issues. I see it as an issue that individuals are not able to overcome biases well enough to reliably use Occam’s razor when promoting solutions to problems. The scientific community as a whole is much more successful in doing this, and no one (neither me nor the OP) disagrees. But an alternate issue arises which is that the scientific community tends to simply fail to evaluate whether or not a proposed theory wins (in the Occam’s razor sense) unless there is a tremendous stack of easily visible experimental evidence to motivate such an evaluation. This is a major reason why single-world views have persisted for so long. Few cling to single-world views because they are “favorite pet theories” (which would classify such an error into the appropriate-use-of-Occam’s-razor type). More often it is just that alternative explanations will simply not even be considered just because they don’t have the temporally aggregated endorsement of the scientific community.
If Eliezer walked up to Sir Roger Penrose and presented a great argument about why the explanation of consciousness due to quantum gravity was just a mysterious answer to a mysterious question, and Penrose replied with something like, “Come back and talk to me when you’ve got 20 years worth of experimental evidence on your side… I don’t want to hear about your retro-active interpretations… it’s not worth my time if there’s not a mountain of evidence to persuade me to update to any new position”, this would be the type of mistake that the OP is trying to point out. And as a grad student at an R-1 university, I can tell you this is anything but a straw man. People go around not updating their maps all the time and their reasoning is just that until some new interpretation is overwhelmingly salient in terms of a flurry of brand new experimental insights, they just won’t even consider that it exists. That’s a serious problem from a Bayesian perspective. And as the turnaround time for scientific results shortens, those willing to update sooner will have a distinct advantage.
Finally, I do not understand how you can say that “We can’t use Solomonoff induction—because it is uncomputable” is a “criticism” with respect to the ideas in the OP. The OP has absolutely nothing to do with the computability of Solomonoff induction. We can use it in the sense that you mentioned when you said:
That’s great that it’s listed there, much as it has been repeatedly listed and emphasized in major discussions for the last 30 years. But many factors are preventing that from trickling down to the work of actual scientists.
You are saying reasonable things, yet at the same time engaging in a stupid argument resulting from ambiguous use of words. Please don’t do that.