I walk through each step in the post. You can approximate Turing complete algorithms with logic gates. And NNs are basically just continuous logic gates and can compute the same functions. And there are algorithms like MCMC which can approximate bayesian inference.
Estimator asked in what sense SI is approximated, not, given a sense, how SI is approximated in that sense. Can you give a metric for which the value is close to SI’s value?
I don’t really understand the question. Or what “sense” means in this context.
Are you asking how close the approximation is the ideal? I would say that depends on the amount of computing power available, and that it approaches it in the limit. But it gives reasonable answers on realistic computers, whereas SI does not.
There is also some loss based on the information required to build a finite turing machine of the right size. As opposed to the infinite number of other structures you can build with logic gates. E.g. a machine that is exactly like a finite turing machine, but the 15th memory cell is corrupted, etc.
I don’t think this problem is unsolvable though. There are for example Neural Turing Machines which give it access to an infinite differentiable memory.
An example of a sense would be to define some quantification of how good an algorithm is, and then show that a particular algorithm has a large value for that quantity, compared to SI. In order to rigorously state that X approaches Y “in the limit”, you have to have some index n, and some metric M, such that |M(Xn)-M(Yn)| → 0. Otherwise, you’re simply making a subjective statement that you find X to be “good”. So, for instance, if you can show that the loss in utility in using your algorithm rather than SI goes to zero as the size of the dataset goes to infinity, that would be an objective sense in which your algorithm approximates SI.
It should be obvious that SGD over an appropriately general model—with appropriate random inits and continuous restarts with keep the max score solution found—will eventually converge on the global optimum, and will do so in expected time similar or better to any naive brute force search such as SI.
In particular SGD is good at exploiting any local smoothness in solution space.
Can you unpack “approximation of Solomonoff induction”? Approximation in what sense?
I walk through each step in the post. You can approximate Turing complete algorithms with logic gates. And NNs are basically just continuous logic gates and can compute the same functions. And there are algorithms like MCMC which can approximate bayesian inference.
Estimator asked in what sense SI is approximated, not, given a sense, how SI is approximated in that sense. Can you give a metric for which the value is close to SI’s value?
I don’t really understand the question. Or what “sense” means in this context.
Are you asking how close the approximation is the ideal? I would say that depends on the amount of computing power available, and that it approaches it in the limit. But it gives reasonable answers on realistic computers, whereas SI does not.
There is also some loss based on the information required to build a finite turing machine of the right size. As opposed to the infinite number of other structures you can build with logic gates. E.g. a machine that is exactly like a finite turing machine, but the 15th memory cell is corrupted, etc.
I don’t think this problem is unsolvable though. There are for example Neural Turing Machines which give it access to an infinite differentiable memory.
An example of a sense would be to define some quantification of how good an algorithm is, and then show that a particular algorithm has a large value for that quantity, compared to SI. In order to rigorously state that X approaches Y “in the limit”, you have to have some index n, and some metric M, such that |M(Xn)-M(Yn)| → 0. Otherwise, you’re simply making a subjective statement that you find X to be “good”. So, for instance, if you can show that the loss in utility in using your algorithm rather than SI goes to zero as the size of the dataset goes to infinity, that would be an objective sense in which your algorithm approximates SI.
It should be obvious that SGD over an appropriately general model—with appropriate random inits and continuous restarts with keep the max score solution found—will eventually converge on the global optimum, and will do so in expected time similar or better to any naive brute force search such as SI.
In particular SGD is good at exploiting any local smoothness in solution space.