Thanks for explaining the way to do exhaustive search—a big network can exhaustively search smaller network configurations. I believe that.
However, a CPU is not Turing complete (what is Turing universal?) - a CPU with an infinite read/write tape is Turing complete. This matters, because Solomonoff induction is a mixture of Turing machines. There are simple functions transformers can’t learn, such as “print the binary representation of the input + 1”; they run out of room. Solomonoff induction is not limited in this way.
Practical transformers are also usually (always?) used with exchangeable sequences, while Solomonoff inductors operate on general sequences. I can imagine ways around this (use a RNN and many epochs with a single sequence) so maybe not a fundamental limit, but still a big difference between neural nets in practice and Solomonoff inductors.
Thanks for explaining the way to do exhaustive search—a big network can exhaustively search smaller network configurations. I believe that.
However, a CPU is not Turing complete (what is Turing universal?) - a CPU with an infinite read/write tape is Turing complete. This matters, because Solomonoff induction is a mixture of Turing machines. There are simple functions transformers can’t learn, such as “print the binary representation of the input + 1”; they run out of room. Solomonoff induction is not limited in this way.
Practical transformers are also usually (always?) used with exchangeable sequences, while Solomonoff inductors operate on general sequences. I can imagine ways around this (use a RNN and many epochs with a single sequence) so maybe not a fundamental limit, but still a big difference between neural nets in practice and Solomonoff inductors.