I’m not sure if this comment goes best here, or in the “Against Strong Bayesianism” post. But I’ll put it here, because this is fresher.
I think it’s important to be careful when you’re taking limits.
I think it’s true that “The policy that would result from a naive implementation of Solomonoff induction followed by expected utility maximization, given infinite computing power, is the ideal policy, in that there is no rational process (even using arbitrarily much computing power) that leads to a policy that beats it.”
But say somebody offered you an arbitrarily large-and-fast, but still finite, computer. That is to say, you’re allowed to ask for a google-plex operations per second and a google-plex RAM, or even Graham’s number of each, but you have to name a number then live with it. The above statement does NOT mean that the program you should run on that hyper-computer is a naive implementation of Solomonoff induction. You would still want to use the known tricks for improving the efficiency of Bayesian approximations; that is, things like MCMC, SMC, efficient neural proposal distributions with importance-weighted sampling, efficient pruning of simulations to just the parts that are relevant for predicting input (which, in turn, includes all kinds of causality logic), smart allocation of computational resources between different modes and fallbacks, etc. Such tricks — even just the ones we have already discovered — look a lot more like “intelligence” than naive Solomonoff induction does. Even if, when appropriately combined, their limit as computation goes to infinity is the same as the limit of Solomonoff induction as computation goes to infinity
In other words, saying “the limit as amount-of-computation X goes to infinity of program A, strictly beats program B with amount Y of finite computation, for any B and Y”; or even “the limit as amount-of-computation X goes to infinity of program A, is as good or better than the limit as amount-of-computation Y goes to infinity of program B, for any B” … is true, but not very surprising or important, because it absolutely does not imply that “as computation X goes to infinity, program A with X resources beats program B with X resources, for any B”.
The problem with this definition is that it focusses too much on the details of the computational substrate. Suppose the programming language used has a built in function for matrix multiplication, and it is 2x as fast as any program that could be written within the language. Then any program that does its own matrix multiplication will be less intelligent than one that uses the built in functions.
“A with X resources beats program B with X resources, for any B” could be true if A is just B with the first few steps precomputed. It focusses too much on the little hackish tricks specific to the substrate.
Maybe say that two algorithms A, B are equivalent up to polynomial factors if there exists a polynomial p(x) so that A with p(x) compute beats B with X compute for all x, and likewise B with p(x) compute beats A with x compute.
Yes, your restatement feels to me like a clear improvement.
In fact, considering it, I think that if algorithm A is “truly more intelligent” than algorithm B, I’d expect if f(x) is the compute that it takes for B to perform as well or better than A, f(x) could even be super-exponential in x. Exponential would be the lower bound; what you’d get from a mere incremental improvement in pruning. From this perspective, anything polynomial would be “just implementation”, not “real intelligence”.
Seem just false. If you’re not worried about confronting agents of equal size (which is equally a concern for a Solomonoff inductor) then a naive bounded Solomonoff inductor running on a Grahamputer will give you essentially the same result for all practical purposes as a Solomonoff inductor. That’s far more than enough compute to contain our physical universe as a hypothesis. You don’t bother with MCMC on a Grahamputer.
I’m not sure if this comment goes best here, or in the “Against Strong Bayesianism” post. But I’ll put it here, because this is fresher.
I think it’s important to be careful when you’re taking limits.
I think it’s true that “The policy that would result from a naive implementation of Solomonoff induction followed by expected utility maximization, given infinite computing power, is the ideal policy, in that there is no rational process (even using arbitrarily much computing power) that leads to a policy that beats it.”
But say somebody offered you an arbitrarily large-and-fast, but still finite, computer. That is to say, you’re allowed to ask for a google-plex operations per second and a google-plex RAM, or even Graham’s number of each, but you have to name a number then live with it. The above statement does NOT mean that the program you should run on that hyper-computer is a naive implementation of Solomonoff induction. You would still want to use the known tricks for improving the efficiency of Bayesian approximations; that is, things like MCMC, SMC, efficient neural proposal distributions with importance-weighted sampling, efficient pruning of simulations to just the parts that are relevant for predicting input (which, in turn, includes all kinds of causality logic), smart allocation of computational resources between different modes and fallbacks, etc. Such tricks — even just the ones we have already discovered — look a lot more like “intelligence” than naive Solomonoff induction does. Even if, when appropriately combined, their limit as computation goes to infinity is the same as the limit of Solomonoff induction as computation goes to infinity
In other words, saying “the limit as amount-of-computation X goes to infinity of program A, strictly beats program B with amount Y of finite computation, for any B and Y”; or even “the limit as amount-of-computation X goes to infinity of program A, is as good or better than the limit as amount-of-computation Y goes to infinity of program B, for any B” … is true, but not very surprising or important, because it absolutely does not imply that “as computation X goes to infinity, program A with X resources beats program B with X resources, for any B”.
The problem with this definition is that it focusses too much on the details of the computational substrate. Suppose the programming language used has a built in function for matrix multiplication, and it is 2x as fast as any program that could be written within the language. Then any program that does its own matrix multiplication will be less intelligent than one that uses the built in functions.
“A with X resources beats program B with X resources, for any B” could be true if A is just B with the first few steps precomputed. It focusses too much on the little hackish tricks specific to the substrate.
Maybe say that two algorithms A, B are equivalent up to polynomial factors if there exists a polynomial p(x) so that A with p(x) compute beats B with X compute for all x, and likewise B with p(x) compute beats A with x compute.
Yes, your restatement feels to me like a clear improvement.
In fact, considering it, I think that if algorithm A is “truly more intelligent” than algorithm B, I’d expect if f(x) is the compute that it takes for B to perform as well or better than A, f(x) could even be super-exponential in x. Exponential would be the lower bound; what you’d get from a mere incremental improvement in pruning. From this perspective, anything polynomial would be “just implementation”, not “real intelligence”.
Seem just false. If you’re not worried about confronting agents of equal size (which is equally a concern for a Solomonoff inductor) then a naive bounded Solomonoff inductor running on a Grahamputer will give you essentially the same result for all practical purposes as a Solomonoff inductor. That’s far more than enough compute to contain our physical universe as a hypothesis. You don’t bother with MCMC on a Grahamputer.
If we’re positing a Grahamputer, then “yeah but it’s essentially the same if you’re not worried about agents of equal size” seems too loose.
In other words, with great compute power, comes great compute responsibility.