I think the point is even stronger than that—Solomonoff induction requires not just infinite compute/time but doing something literally logically impossible—the prior is straight up uncomputable, not in any real-world tractability sense but as uncomputable as the Halting problem is. There’s a huge qualitative gulf between “we can’t solve this problem without idealised computers with unbounded time” and “we can’t solve this on a computer by definition”. Makes a huge difference to how much use the approach is for “crispening” ideas IMO
Yup, actual Solomonoff induction is uncomputable. I’m not sure what you mean by “not just infinite compute/time”, though; given truly infinite computation you absolutely could do it. (Though in a world where that was possible, you’d really want your Solomonoff inductor to consider possible explanations that likewise require an infinite amount of computation, and then you’d be back with the same problem again.) I guess the distinction you’re making is between “requires a finite but absurdly large amount of computation” and “requires literally infinite computation”, and I agree that the latter is what Solomonoff induction requires.
I think that reduces the credibility of the claim “Solomonoff induction is the One True Way to do inference, at least ideally speaking”. But I think the following weaker proposal is intact:
Given two rival explanations of our observations, in the happy (and pretty much unheard-of) case where they have both been expressed precisely in terms of computer programs, all else equal we should consider the shorter program “simpler”, “better”, and “more likely right”.
One way for all else not to be equal is if one program is shorter than the other just because it’s been more highly optimized in some way that doesn’t have much to do with the actual theory it embodies. So rather than “shorter program better” we should really say something more like “shorter program, after making it as small as possible, better”.
Obviously, coming up with an actual program that perfectly explains all our observations is unrealistic; those observations include reading scientific papers, so it seems like the program would need to include a complete Theory Of Everything in physics; those observations include interactions with other humans, so it seems like the program would need to include at least as much intelligence as any person we encounter; these are both famously hard problems that the human race has not yet cracked.
But given two proposals for explaining the same subset of our experience, if we are credibly able to reason about which corresponds (after optimization) to the shorter program, we should prefer the proposal that seems like it has the shorter program.
In many real cases, it will be completely unclear which of two proposals corresponds to the shorter program after optimization, and in that case we’ll have to use some other heuristic. And even in cases where it seems pretty clear which of two proposals corresponds to the shorter program, we need to be aware that we could be very wrong because what I’ve blithely called “optimization” is uncomputable and we can never be sure there isn’t a shorter program. And it’s not like the gods have handed down to us conclusive reason to think that simpler programs really are more likely to be right; it’s just that heuristics along those lines have worked out pretty well for us, and concepts of “simpler” nearer to “shorter program” and further from “seems simpler to naive human brain” generally seem to work better. So (unless there’s some clever point I’m missing, which there might be) some of EY’s more dramatic claims about how anything other than Solomonoff induction is Wrong and Biased and Irrational seem overblown. But if you treat it as a very promising heuristic rather than a definitely known truth I’m still on board.
Also, you totally can do something that seems to me sufficiently along the lines of Solomonoff induction with an amount of computation that’s merely absurdly large, rather than merely infinite. At step N you try out all programs of length up to N running for up to N steps, and see whether they produce output consistent with your observations to date. Once N gets very large you have discovered things like “among programs shorter than a gigabyte running for less than 10^100 cycles and producing output consistent with these observations, these are the shortest ones, and if we weight them SI-style then the probability distribution for our next observation is this”. And if (as it seems to me kinda plausible that you actually should) you also somehow regard more expensive computations as less probable, then once N gets large enough that you have any candidate programs that produce the right predictions you can start putting upper bounds on the errors in your predictions compared with an ideal no-longer-exactly-Solomonoff inductor, and those errors tend to 0 as the amount of computation you’re willing to do increases.
To be clear, the amount of computation required for any version of this is absurdly impractical in the real world, and if your ideal is actual Solomonff induction then you don’t get any error bounds because you can’t rule out the possibility that some shorter program that hasn’t generated any output yet might do a good job eventually. But it’s not as if you literally can’t do anything Solomonoff-induction-shaped without literally infinite amounts of computation.
I think there’s a sense in which some problems can be uncomputable even with infinite compute no? For example if the Halting problem were computable even with literally infinite time, then we could construct a machine that halted when given its own description iff it ran forever when given its own description. I do think theres a distinction beyond just “arbitrarily large finite compute vs. infinite compute”. It seems like either some problems have to be uncomputable even by a hyper-computer, or else the concept of infinite compute time is less straightforward than it seems
I totally agree on your other points though, I think the concept of bounded Solomonoff induction could be interesting in itself, although I presume with it you lose all the theoretical guarantees around bounded error. Would definitely be interested to see if there’s literature on this
The halting problem is computable with literally-infinite time. But, to be precise, what this means is that a hypercomputer could determine whether a (nonhyper)computer halts; in a universe containing hypercomputers, we would not be very interested in that, and we’d be asking for something that determines whether a given hypercomputer halts (or something like that; I haven’t given much thought to what corresponds to “halting” for any given model of hypercomputation...) which would be impossible for the same sort of reasons as the ordinary halting problem is impossible for ordinary computers.
But I think it’s only fair to describe this by saying “the halting problem is impossible even with infinite computational resources” if you acknowledge that then “the halting problem” isn’t a single problem, it’s a thing that varies according to what computational resources you’ve got, getting harder when you have more resources to throw at it.
I think the point is even stronger than that—Solomonoff induction requires not just infinite compute/time but doing something literally logically impossible—the prior is straight up uncomputable, not in any real-world tractability sense but as uncomputable as the Halting problem is. There’s a huge qualitative gulf between “we can’t solve this problem without idealised computers with unbounded time” and “we can’t solve this on a computer by definition”. Makes a huge difference to how much use the approach is for “crispening” ideas IMO
Yup, actual Solomonoff induction is uncomputable. I’m not sure what you mean by “not just infinite compute/time”, though; given truly infinite computation you absolutely could do it. (Though in a world where that was possible, you’d really want your Solomonoff inductor to consider possible explanations that likewise require an infinite amount of computation, and then you’d be back with the same problem again.) I guess the distinction you’re making is between “requires a finite but absurdly large amount of computation” and “requires literally infinite computation”, and I agree that the latter is what Solomonoff induction requires.
I think that reduces the credibility of the claim “Solomonoff induction is the One True Way to do inference, at least ideally speaking”. But I think the following weaker proposal is intact:
Given two rival explanations of our observations, in the happy (and pretty much unheard-of) case where they have both been expressed precisely in terms of computer programs, all else equal we should consider the shorter program “simpler”, “better”, and “more likely right”.
One way for all else not to be equal is if one program is shorter than the other just because it’s been more highly optimized in some way that doesn’t have much to do with the actual theory it embodies. So rather than “shorter program better” we should really say something more like “shorter program, after making it as small as possible, better”.
Obviously, coming up with an actual program that perfectly explains all our observations is unrealistic; those observations include reading scientific papers, so it seems like the program would need to include a complete Theory Of Everything in physics; those observations include interactions with other humans, so it seems like the program would need to include at least as much intelligence as any person we encounter; these are both famously hard problems that the human race has not yet cracked.
But given two proposals for explaining the same subset of our experience, if we are credibly able to reason about which corresponds (after optimization) to the shorter program, we should prefer the proposal that seems like it has the shorter program.
In many real cases, it will be completely unclear which of two proposals corresponds to the shorter program after optimization, and in that case we’ll have to use some other heuristic. And even in cases where it seems pretty clear which of two proposals corresponds to the shorter program, we need to be aware that we could be very wrong because what I’ve blithely called “optimization” is uncomputable and we can never be sure there isn’t a shorter program. And it’s not like the gods have handed down to us conclusive reason to think that simpler programs really are more likely to be right; it’s just that heuristics along those lines have worked out pretty well for us, and concepts of “simpler” nearer to “shorter program” and further from “seems simpler to naive human brain” generally seem to work better. So (unless there’s some clever point I’m missing, which there might be) some of EY’s more dramatic claims about how anything other than Solomonoff induction is Wrong and Biased and Irrational seem overblown. But if you treat it as a very promising heuristic rather than a definitely known truth I’m still on board.
Also, you totally can do something that seems to me sufficiently along the lines of Solomonoff induction with an amount of computation that’s merely absurdly large, rather than merely infinite. At step N you try out all programs of length up to N running for up to N steps, and see whether they produce output consistent with your observations to date. Once N gets very large you have discovered things like “among programs shorter than a gigabyte running for less than 10^100 cycles and producing output consistent with these observations, these are the shortest ones, and if we weight them SI-style then the probability distribution for our next observation is this”. And if (as it seems to me kinda plausible that you actually should) you also somehow regard more expensive computations as less probable, then once N gets large enough that you have any candidate programs that produce the right predictions you can start putting upper bounds on the errors in your predictions compared with an ideal no-longer-exactly-Solomonoff inductor, and those errors tend to 0 as the amount of computation you’re willing to do increases.
To be clear, the amount of computation required for any version of this is absurdly impractical in the real world, and if your ideal is actual Solomonff induction then you don’t get any error bounds because you can’t rule out the possibility that some shorter program that hasn’t generated any output yet might do a good job eventually. But it’s not as if you literally can’t do anything Solomonoff-induction-shaped without literally infinite amounts of computation.
I think there’s a sense in which some problems can be uncomputable even with infinite compute no? For example if the Halting problem were computable even with literally infinite time, then we could construct a machine that halted when given its own description iff it ran forever when given its own description. I do think theres a distinction beyond just “arbitrarily large finite compute vs. infinite compute”. It seems like either some problems have to be uncomputable even by a hyper-computer, or else the concept of infinite compute time is less straightforward than it seems
I totally agree on your other points though, I think the concept of bounded Solomonoff induction could be interesting in itself, although I presume with it you lose all the theoretical guarantees around bounded error. Would definitely be interested to see if there’s literature on this
The halting problem is computable with literally-infinite time. But, to be precise, what this means is that a hypercomputer could determine whether a (nonhyper)computer halts; in a universe containing hypercomputers, we would not be very interested in that, and we’d be asking for something that determines whether a given hypercomputer halts (or something like that; I haven’t given much thought to what corresponds to “halting” for any given model of hypercomputation...) which would be impossible for the same sort of reasons as the ordinary halting problem is impossible for ordinary computers.
But I think it’s only fair to describe this by saying “the halting problem is impossible even with infinite computational resources” if you acknowledge that then “the halting problem” isn’t a single problem, it’s a thing that varies according to what computational resources you’ve got, getting harder when you have more resources to throw at it.