Any useful CPU is by my definition—turing universal.
Solomonoff does exhaustive search for any amount of data; is part of your claim that as data-> infinity, NN + SGD → Solomonoff?
You can think of solomonoff as iterating over all programs/circuits by size, evaluating each on all the data, etc.
A sufficiently wide NN + SGD can search the full circuit space up to a depth D across the data set in an efficient way (reusing all subcomputations across sparse subcircuit solutions (lottery tickets)).
Thanks for explaining the way to do exhaustive search—a big network can exhaustively search smaller network configurations. I believe that.
However, a CPU is not Turing complete (what is Turing universal?) - a CPU with an infinite read/write tape is Turing complete. This matters, because Solomonoff induction is a mixture of Turing machines. There are simple functions transformers can’t learn, such as “print the binary representation of the input + 1”; they run out of room. Solomonoff induction is not limited in this way.
Practical transformers are also usually (always?) used with exchangeable sequences, while Solomonoff inductors operate on general sequences. I can imagine ways around this (use a RNN and many epochs with a single sequence) so maybe not a fundamental limit, but still a big difference between neural nets in practice and Solomonoff inductors.
Any useful CPU is by my definition—turing universal.
You can think of solomonoff as iterating over all programs/circuits by size, evaluating each on all the data, etc.
A sufficiently wide NN + SGD can search the full circuit space up to a depth D across the data set in an efficient way (reusing all subcomputations across sparse subcircuit solutions (lottery tickets)).
Thanks for explaining the way to do exhaustive search—a big network can exhaustively search smaller network configurations. I believe that.
However, a CPU is not Turing complete (what is Turing universal?) - a CPU with an infinite read/write tape is Turing complete. This matters, because Solomonoff induction is a mixture of Turing machines. There are simple functions transformers can’t learn, such as “print the binary representation of the input + 1”; they run out of room. Solomonoff induction is not limited in this way.
Practical transformers are also usually (always?) used with exchangeable sequences, while Solomonoff inductors operate on general sequences. I can imagine ways around this (use a RNN and many epochs with a single sequence) so maybe not a fundamental limit, but still a big difference between neural nets in practice and Solomonoff inductors.