A couple of things. First, a minimal circuit is not the same as a speed-prior-minimal algorithm. Minimal circuits have to be minimal in width + depth, so a GLUT would definitely lose out. Second, even if you’re operating on pure speed, I think there are sets of tasks that are large enough that a GLUT won’t work. For example, consider the task of finding the minimum of an arbitrary convex function. Certainly for the infinite set of all possible convex functions (on the rationals, say), I would be pretty surprised if something like gradient descent weren’t the fastest way to do that. Even if you restrict to only finitely many convex functions, if your set is large enough it still seems hard to do better than gradient descent, especially since looking up the solution in a huge GLUT could be quite expensive (how do you do the lookup? a hash table? a binary tree? I’m not sure if those would be good enough here).
First, a minimal circuit is not the same as a speed-prior-minimal algorithm. Minimal circuits have to be minimal in width + depth, so a GLUT would definitely lose out.
I don’t really understand this—my understanding is that with a minimal circuit, you want to minimize the number of gates in the circuit, and the circuit must be a DAG (if you’re allowed to have loops + clocks, then you can build a regular computer, and for complex tasks the problem should be very similar to finding the shortest program and implementing it on a universal circuit).
But then I can create a Turing Machine that interprets its input as a circuit, and simulates each gate of the circuit in sequence. Then the running time of the TM is proportional to the number of gates in the circuit, so an input with minimal running time should be a circuit with the minimal number of gates. This is not technically a Universal TM, since loop-free circuits are not Turing-complete, but I would expect a speed prior using such a Turing Machine would be relatively similar to a speed prior with a true UTM.
EDIT: I no longer endorse the above point. Just because there exists a speed prior that aligns with minimal circuits doesn’t mean that this is the “typical” speed prior. In general there really should be a difference between a “minimal circuit” and a “fastest program” (precisely the depth + width vs. depth point Evan made in the parent comment).
For example, consider the task of finding the minimum of an arbitrary convex function. Certainly for the infinite set of all possible convex functions (on the rationals, say), I would be pretty surprised if something like gradient descent weren’t the fastest way to do that.
I agree that if you have to work on an infinite set of inputs, a GLUT is not the way to do that. I was thinking of the case where you have to work on a small finite set of inputs (hence why I talk about the optimal trajectory instead of the optimal policy), which is always going to be the case in the real world. But this is too pedantic, we can certainly think of the theoretical case where you have to work on an infinite set of inputs. I was mostly trying to use the GLUT as an intuition pump, not arguing that it was always better.
In the case with infinite inputs, I still have the intuition that meta learning is what you do when you don’t know enough about the problem to write down the good heuristics straight away, as opposed to being the fastest way of solving the problem. But I agree that the fastest solution won’t be a GLUT; I’m thinking more a combination of really good heuristics that “directly” solve the problem. (Part of this is an intuition that for any reasonably structured set of inputs, value-neutral optimization of a fixed objective is very inefficient.)
A couple of things. First, a minimal circuit is not the same as a speed-prior-minimal algorithm. Minimal circuits have to be minimal in width + depth, so a GLUT would definitely lose out. Second, even if you’re operating on pure speed, I think there are sets of tasks that are large enough that a GLUT won’t work. For example, consider the task of finding the minimum of an arbitrary convex function. Certainly for the infinite set of all possible convex functions (on the rationals, say), I would be pretty surprised if something like gradient descent weren’t the fastest way to do that. Even if you restrict to only finitely many convex functions, if your set is large enough it still seems hard to do better than gradient descent, especially since looking up the solution in a huge GLUT could be quite expensive (how do you do the lookup? a hash table? a binary tree? I’m not sure if those would be good enough here).
I don’t really understand this—my understanding is that with a minimal circuit, you want to minimize the number of gates in the circuit, and the circuit must be a DAG (if you’re allowed to have loops + clocks, then you can build a regular computer, and for complex tasks the problem should be very similar to finding the shortest program and implementing it on a universal circuit).
But then I can create a Turing Machine that interprets its input as a circuit, and simulates each gate of the circuit in sequence. Then the running time of the TM is proportional to the number of gates in the circuit, so an input with minimal running time should be a circuit with the minimal number of gates. This is not technically a Universal TM, since loop-free circuits are not Turing-complete, but I would expect a speed prior using such a Turing Machine would be relatively similar to a speed prior with a true UTM.
EDIT: I no longer endorse the above point. Just because there exists a speed prior that aligns with minimal circuits doesn’t mean that this is the “typical” speed prior. In general there really should be a difference between a “minimal circuit” and a “fastest program” (precisely the depth + width vs. depth point Evan made in the parent comment).
I agree that if you have to work on an infinite set of inputs, a GLUT is not the way to do that. I was thinking of the case where you have to work on a small finite set of inputs (hence why I talk about the optimal trajectory instead of the optimal policy), which is always going to be the case in the real world. But this is too pedantic, we can certainly think of the theoretical case where you have to work on an infinite set of inputs. I was mostly trying to use the GLUT as an intuition pump, not arguing that it was always better.
In the case with infinite inputs, I still have the intuition that meta learning is what you do when you don’t know enough about the problem to write down the good heuristics straight away, as opposed to being the fastest way of solving the problem. But I agree that the fastest solution won’t be a GLUT; I’m thinking more a combination of really good heuristics that “directly” solve the problem. (Part of this is an intuition that for any reasonably structured set of inputs, value-neutral optimization of a fixed objective is very inefficient.)