I think there’s something a bit fishy about the Poe/Shannon analogy.
In order to understand what’s wrong with Poe’s argument, what you need (or at least what would have been helpful) is an understanding of how having an absurdly large amount of computing power would enable you to solve the problem.
Solomonoff induction assumes not merely an absurdly large amount, but an infinite amount. And simple approximations to Solomonoff assume much more absurdly large amounts of compute than e.g. Shannon.
Poe says “no mechanical device can possibly play a decent game of chess, because it would need to look at a variety of different possible sequences of moves, and that’s not a thing that can be done mechanically”. Shannon says two relevant things: first, “here is how, with an unreasonably large amount of computation, we could play a perfect game of chess”; second, “here is how, with a large but not so unreasonably large amount of computation, we could play a pretty decent game of chess”. Neither of these requires a hypercomputer; even computing the whole search tree is a finite amount of work. And the second was something that could be done to some extent using hardware Shannon could have built. To play grandmaster-level chess using exactly Shannon’s method would require an infeasibly large amount of computing power (which we have since dealt with (1) by discovering alpha-beta and other sorts of pruning and (2) by making then-infeasibly powerful computers). But Shannon did show that machinery of a kind that can exist within the universe can play a not-so-awful game of chess; he genuinely refuted Poe’s argument. And he didn’t appeal to hypercomputers to do it.
Whereas someone skeptical of the very idea of mechanized epistemology can respond to Solomonoff by saying “yeah, but your algorithm requires a literally infinite amount of computation. It’s not just that present technology falls short of what it needs, but that no possible technology could do it”. A Poe-alike couldn’t make such an argument for chess.
You might reply: yeah, sure, but we can “truncate” a Solomonoff inductor so that it only considers programs of, say, some finite maximum size; then it’s no longer infinite and so long as reality isn’t too complicated it’ll still give good results. Except that unless that finite maximum is so tiny that Solomonoff does nothing useful, your truncated Solomonoff inductor is still too resource-hungry to do anything useful even if we turn the whole observable universe into computronium and let it run for the entire lifetime of that universe so far. Again, a Poe-alike couldn’t have made such an argument for chess; if you really had to, you could build a Babbage-like contraption that plays chess using some sort of brute-force search to a couple of ply. It would play badly but still better than some beginners, and that’s already enough to refute Poe.
That doesn’t mean that Solomonoff induction isn’t a useful notion, or that it can’t provide insight into what’s going on when we interpret the evidence of our senses, or that it can’t be a useful inspiration for more practical attempts at mechanized epistemology. But I think the situation is sufficiently different from that of Poe contemplating the Mechanical Turk to make the analogy not very helpful.
Extending OP’s argument, I would put it like this. Suppose you start off having no idea whatsoever how you might do a thing, and end up having a practical algorithm you can implement on real hardware. You can (in principle) split that up into the following stages: (0) no idea whatsoever, (1) can do it with literally infinite compute, (2) can do it with superexponentially-huge compute, (3) can do it with tile-the-universe-with-computronium compute, (4) can do it with several orders of magnitude more than we have, (5) can do it with something like realistic resources. I suggest that each of those stages may correspond to a comparable advance in understanding. Solomonoff induction takes us from 0 to 1, but pace Peter Thiel this is not always the most important step and I don’t think it’s as big a step as from Poe to Shannon.
Except that unless that finite maximum is so tiny that Solomonoff does nothing useful, your truncated Solomonoff inductor is still too resource-hungry to do anything useful even if we turn the whole observable universe into computronium and let it run for the entire lifetime of that universe so far
Not the case!!! The OEIS can be viewed as an abridged Solomonoff inductor, and it is useful.
I think your hierarchy is useful, and it’s helped clarify my own thinking. I’ve never really liked the Poe/Shannon argument, because it seemed so historically contingent: if Shannon’s culture had played go instead of chess, then his completely correct insight about the theoretical tractability of game trees wouldn’t have helped us. In your model, he’d have taken us from 0 to 3 instead of 0 to 4, which wouldn’t have been enough to inspire a playable go program (pedantically, you need game trees before you can invent MCTS, but the connection from Shannon to AlphaGo is clearly much looser than from Shannon to Deep Blue).
I think the point is even stronger than that—Solomonoff induction requires not just infinite compute/time but doing something literally logically impossible—the prior is straight up uncomputable, not in any real-world tractability sense but as uncomputable as the Halting problem is. There’s a huge qualitative gulf between “we can’t solve this problem without idealised computers with unbounded time” and “we can’t solve this on a computer by definition”. Makes a huge difference to how much use the approach is for “crispening” ideas IMO
Yup, actual Solomonoff induction is uncomputable. I’m not sure what you mean by “not just infinite compute/time”, though; given truly infinite computation you absolutely could do it. (Though in a world where that was possible, you’d really want your Solomonoff inductor to consider possible explanations that likewise require an infinite amount of computation, and then you’d be back with the same problem again.) I guess the distinction you’re making is between “requires a finite but absurdly large amount of computation” and “requires literally infinite computation”, and I agree that the latter is what Solomonoff induction requires.
I think that reduces the credibility of the claim “Solomonoff induction is the One True Way to do inference, at least ideally speaking”. But I think the following weaker proposal is intact:
Given two rival explanations of our observations, in the happy (and pretty much unheard-of) case where they have both been expressed precisely in terms of computer programs, all else equal we should consider the shorter program “simpler”, “better”, and “more likely right”.
One way for all else not to be equal is if one program is shorter than the other just because it’s been more highly optimized in some way that doesn’t have much to do with the actual theory it embodies. So rather than “shorter program better” we should really say something more like “shorter program, after making it as small as possible, better”.
Obviously, coming up with an actual program that perfectly explains all our observations is unrealistic; those observations include reading scientific papers, so it seems like the program would need to include a complete Theory Of Everything in physics; those observations include interactions with other humans, so it seems like the program would need to include at least as much intelligence as any person we encounter; these are both famously hard problems that the human race has not yet cracked.
But given two proposals for explaining the same subset of our experience, if we are credibly able to reason about which corresponds (after optimization) to the shorter program, we should prefer the proposal that seems like it has the shorter program.
In many real cases, it will be completely unclear which of two proposals corresponds to the shorter program after optimization, and in that case we’ll have to use some other heuristic. And even in cases where it seems pretty clear which of two proposals corresponds to the shorter program, we need to be aware that we could be very wrong because what I’ve blithely called “optimization” is uncomputable and we can never be sure there isn’t a shorter program. And it’s not like the gods have handed down to us conclusive reason to think that simpler programs really are more likely to be right; it’s just that heuristics along those lines have worked out pretty well for us, and concepts of “simpler” nearer to “shorter program” and further from “seems simpler to naive human brain” generally seem to work better. So (unless there’s some clever point I’m missing, which there might be) some of EY’s more dramatic claims about how anything other than Solomonoff induction is Wrong and Biased and Irrational seem overblown. But if you treat it as a very promising heuristic rather than a definitely known truth I’m still on board.
Also, you totally can do something that seems to me sufficiently along the lines of Solomonoff induction with an amount of computation that’s merely absurdly large, rather than merely infinite. At step N you try out all programs of length up to N running for up to N steps, and see whether they produce output consistent with your observations to date. Once N gets very large you have discovered things like “among programs shorter than a gigabyte running for less than 10^100 cycles and producing output consistent with these observations, these are the shortest ones, and if we weight them SI-style then the probability distribution for our next observation is this”. And if (as it seems to me kinda plausible that you actually should) you also somehow regard more expensive computations as less probable, then once N gets large enough that you have any candidate programs that produce the right predictions you can start putting upper bounds on the errors in your predictions compared with an ideal no-longer-exactly-Solomonoff inductor, and those errors tend to 0 as the amount of computation you’re willing to do increases.
To be clear, the amount of computation required for any version of this is absurdly impractical in the real world, and if your ideal is actual Solomonff induction then you don’t get any error bounds because you can’t rule out the possibility that some shorter program that hasn’t generated any output yet might do a good job eventually. But it’s not as if you literally can’t do anything Solomonoff-induction-shaped without literally infinite amounts of computation.
I think there’s a sense in which some problems can be uncomputable even with infinite compute no? For example if the Halting problem were computable even with literally infinite time, then we could construct a machine that halted when given its own description iff it ran forever when given its own description. I do think theres a distinction beyond just “arbitrarily large finite compute vs. infinite compute”. It seems like either some problems have to be uncomputable even by a hyper-computer, or else the concept of infinite compute time is less straightforward than it seems
I totally agree on your other points though, I think the concept of bounded Solomonoff induction could be interesting in itself, although I presume with it you lose all the theoretical guarantees around bounded error. Would definitely be interested to see if there’s literature on this
The halting problem is computable with literally-infinite time. But, to be precise, what this means is that a hypercomputer could determine whether a (nonhyper)computer halts; in a universe containing hypercomputers, we would not be very interested in that, and we’d be asking for something that determines whether a given hypercomputer halts (or something like that; I haven’t given much thought to what corresponds to “halting” for any given model of hypercomputation...) which would be impossible for the same sort of reasons as the ordinary halting problem is impossible for ordinary computers.
But I think it’s only fair to describe this by saying “the halting problem is impossible even with infinite computational resources” if you acknowledge that then “the halting problem” isn’t a single problem, it’s a thing that varies according to what computational resources you’ve got, getting harder when you have more resources to throw at it.
I think there’s something a bit fishy about the Poe/Shannon analogy.
In order to understand what’s wrong with Poe’s argument, what you need (or at least what would have been helpful) is an understanding of how having an absurdly large amount of computing power would enable you to solve the problem.
Solomonoff induction assumes not merely an absurdly large amount, but an infinite amount. And simple approximations to Solomonoff assume much more absurdly large amounts of compute than e.g. Shannon.
Poe says “no mechanical device can possibly play a decent game of chess, because it would need to look at a variety of different possible sequences of moves, and that’s not a thing that can be done mechanically”. Shannon says two relevant things: first, “here is how, with an unreasonably large amount of computation, we could play a perfect game of chess”; second, “here is how, with a large but not so unreasonably large amount of computation, we could play a pretty decent game of chess”. Neither of these requires a hypercomputer; even computing the whole search tree is a finite amount of work. And the second was something that could be done to some extent using hardware Shannon could have built. To play grandmaster-level chess using exactly Shannon’s method would require an infeasibly large amount of computing power (which we have since dealt with (1) by discovering alpha-beta and other sorts of pruning and (2) by making then-infeasibly powerful computers). But Shannon did show that machinery of a kind that can exist within the universe can play a not-so-awful game of chess; he genuinely refuted Poe’s argument. And he didn’t appeal to hypercomputers to do it.
Whereas someone skeptical of the very idea of mechanized epistemology can respond to Solomonoff by saying “yeah, but your algorithm requires a literally infinite amount of computation. It’s not just that present technology falls short of what it needs, but that no possible technology could do it”. A Poe-alike couldn’t make such an argument for chess.
You might reply: yeah, sure, but we can “truncate” a Solomonoff inductor so that it only considers programs of, say, some finite maximum size; then it’s no longer infinite and so long as reality isn’t too complicated it’ll still give good results. Except that unless that finite maximum is so tiny that Solomonoff does nothing useful, your truncated Solomonoff inductor is still too resource-hungry to do anything useful even if we turn the whole observable universe into computronium and let it run for the entire lifetime of that universe so far. Again, a Poe-alike couldn’t have made such an argument for chess; if you really had to, you could build a Babbage-like contraption that plays chess using some sort of brute-force search to a couple of ply. It would play badly but still better than some beginners, and that’s already enough to refute Poe.
That doesn’t mean that Solomonoff induction isn’t a useful notion, or that it can’t provide insight into what’s going on when we interpret the evidence of our senses, or that it can’t be a useful inspiration for more practical attempts at mechanized epistemology. But I think the situation is sufficiently different from that of Poe contemplating the Mechanical Turk to make the analogy not very helpful.
Extending OP’s argument, I would put it like this. Suppose you start off having no idea whatsoever how you might do a thing, and end up having a practical algorithm you can implement on real hardware. You can (in principle) split that up into the following stages: (0) no idea whatsoever, (1) can do it with literally infinite compute, (2) can do it with superexponentially-huge compute, (3) can do it with tile-the-universe-with-computronium compute, (4) can do it with several orders of magnitude more than we have, (5) can do it with something like realistic resources. I suggest that each of those stages may correspond to a comparable advance in understanding. Solomonoff induction takes us from 0 to 1, but pace Peter Thiel this is not always the most important step and I don’t think it’s as big a step as from Poe to Shannon.
Not the case!!! The OEIS can be viewed as an abridged Solomonoff inductor, and it is useful.
I think your hierarchy is useful, and it’s helped clarify my own thinking. I’ve never really liked the Poe/Shannon argument, because it seemed so historically contingent: if Shannon’s culture had played go instead of chess, then his completely correct insight about the theoretical tractability of game trees wouldn’t have helped us. In your model, he’d have taken us from 0 to 3 instead of 0 to 4, which wouldn’t have been enough to inspire a playable go program (pedantically, you need game trees before you can invent MCTS, but the connection from Shannon to AlphaGo is clearly much looser than from Shannon to Deep Blue).
I think the point is even stronger than that—Solomonoff induction requires not just infinite compute/time but doing something literally logically impossible—the prior is straight up uncomputable, not in any real-world tractability sense but as uncomputable as the Halting problem is. There’s a huge qualitative gulf between “we can’t solve this problem without idealised computers with unbounded time” and “we can’t solve this on a computer by definition”. Makes a huge difference to how much use the approach is for “crispening” ideas IMO
Yup, actual Solomonoff induction is uncomputable. I’m not sure what you mean by “not just infinite compute/time”, though; given truly infinite computation you absolutely could do it. (Though in a world where that was possible, you’d really want your Solomonoff inductor to consider possible explanations that likewise require an infinite amount of computation, and then you’d be back with the same problem again.) I guess the distinction you’re making is between “requires a finite but absurdly large amount of computation” and “requires literally infinite computation”, and I agree that the latter is what Solomonoff induction requires.
I think that reduces the credibility of the claim “Solomonoff induction is the One True Way to do inference, at least ideally speaking”. But I think the following weaker proposal is intact:
Given two rival explanations of our observations, in the happy (and pretty much unheard-of) case where they have both been expressed precisely in terms of computer programs, all else equal we should consider the shorter program “simpler”, “better”, and “more likely right”.
One way for all else not to be equal is if one program is shorter than the other just because it’s been more highly optimized in some way that doesn’t have much to do with the actual theory it embodies. So rather than “shorter program better” we should really say something more like “shorter program, after making it as small as possible, better”.
Obviously, coming up with an actual program that perfectly explains all our observations is unrealistic; those observations include reading scientific papers, so it seems like the program would need to include a complete Theory Of Everything in physics; those observations include interactions with other humans, so it seems like the program would need to include at least as much intelligence as any person we encounter; these are both famously hard problems that the human race has not yet cracked.
But given two proposals for explaining the same subset of our experience, if we are credibly able to reason about which corresponds (after optimization) to the shorter program, we should prefer the proposal that seems like it has the shorter program.
In many real cases, it will be completely unclear which of two proposals corresponds to the shorter program after optimization, and in that case we’ll have to use some other heuristic. And even in cases where it seems pretty clear which of two proposals corresponds to the shorter program, we need to be aware that we could be very wrong because what I’ve blithely called “optimization” is uncomputable and we can never be sure there isn’t a shorter program. And it’s not like the gods have handed down to us conclusive reason to think that simpler programs really are more likely to be right; it’s just that heuristics along those lines have worked out pretty well for us, and concepts of “simpler” nearer to “shorter program” and further from “seems simpler to naive human brain” generally seem to work better. So (unless there’s some clever point I’m missing, which there might be) some of EY’s more dramatic claims about how anything other than Solomonoff induction is Wrong and Biased and Irrational seem overblown. But if you treat it as a very promising heuristic rather than a definitely known truth I’m still on board.
Also, you totally can do something that seems to me sufficiently along the lines of Solomonoff induction with an amount of computation that’s merely absurdly large, rather than merely infinite. At step N you try out all programs of length up to N running for up to N steps, and see whether they produce output consistent with your observations to date. Once N gets very large you have discovered things like “among programs shorter than a gigabyte running for less than 10^100 cycles and producing output consistent with these observations, these are the shortest ones, and if we weight them SI-style then the probability distribution for our next observation is this”. And if (as it seems to me kinda plausible that you actually should) you also somehow regard more expensive computations as less probable, then once N gets large enough that you have any candidate programs that produce the right predictions you can start putting upper bounds on the errors in your predictions compared with an ideal no-longer-exactly-Solomonoff inductor, and those errors tend to 0 as the amount of computation you’re willing to do increases.
To be clear, the amount of computation required for any version of this is absurdly impractical in the real world, and if your ideal is actual Solomonff induction then you don’t get any error bounds because you can’t rule out the possibility that some shorter program that hasn’t generated any output yet might do a good job eventually. But it’s not as if you literally can’t do anything Solomonoff-induction-shaped without literally infinite amounts of computation.
I think there’s a sense in which some problems can be uncomputable even with infinite compute no? For example if the Halting problem were computable even with literally infinite time, then we could construct a machine that halted when given its own description iff it ran forever when given its own description. I do think theres a distinction beyond just “arbitrarily large finite compute vs. infinite compute”. It seems like either some problems have to be uncomputable even by a hyper-computer, or else the concept of infinite compute time is less straightforward than it seems
I totally agree on your other points though, I think the concept of bounded Solomonoff induction could be interesting in itself, although I presume with it you lose all the theoretical guarantees around bounded error. Would definitely be interested to see if there’s literature on this
The halting problem is computable with literally-infinite time. But, to be precise, what this means is that a hypercomputer could determine whether a (nonhyper)computer halts; in a universe containing hypercomputers, we would not be very interested in that, and we’d be asking for something that determines whether a given hypercomputer halts (or something like that; I haven’t given much thought to what corresponds to “halting” for any given model of hypercomputation...) which would be impossible for the same sort of reasons as the ordinary halting problem is impossible for ordinary computers.
But I think it’s only fair to describe this by saying “the halting problem is impossible even with infinite computational resources” if you acknowledge that then “the halting problem” isn’t a single problem, it’s a thing that varies according to what computational resources you’ve got, getting harder when you have more resources to throw at it.