Does it mean then that neural networks start with a completely crazy model of the real world, and slowly modify this model to better fit the data
This seems like a good description to me.
as opposed to jumping between model sets that fit the data perfectly, as Solomonoff induction does?
I’m not an expert in Solomonoff induction, but my impression is that each model set is a subset of the model set from the last step. That is, you consider every possible output string (implicitly) by considering every possible program that could generate those strings, and I assume stochastic programs (like ‘flip a coin n times and output 1 for heads and 0 for tails’) are expressed by some algorithmic description followed by the random seed (so that the algorithm itself is deterministic, but the set of algorithms for all possible seeds meets the stochastic properties of the definition).
As we get a new piece of the output string—perhaps we see it move from “1100” to “11001″--we rule out any program that would not have output “11001,” which includes about half of our surviving coin-flip programs and about 90% of our remaining 10-sided die programs. So the class of models that “fit the data perfectly” is a very broad class of models, and you could imagine neural networks as estimating the mean of that class of models instead of every instance of the class and then taking the mean of them.
This seems like a good description to me.
I’m not an expert in Solomonoff induction, but my impression is that each model set is a subset of the model set from the last step. That is, you consider every possible output string (implicitly) by considering every possible program that could generate those strings, and I assume stochastic programs (like ‘flip a coin n times and output 1 for heads and 0 for tails’) are expressed by some algorithmic description followed by the random seed (so that the algorithm itself is deterministic, but the set of algorithms for all possible seeds meets the stochastic properties of the definition).
As we get a new piece of the output string—perhaps we see it move from “1100” to “11001″--we rule out any program that would not have output “11001,” which includes about half of our surviving coin-flip programs and about 90% of our remaining 10-sided die programs. So the class of models that “fit the data perfectly” is a very broad class of models, and you could imagine neural networks as estimating the mean of that class of models instead of every instance of the class and then taking the mean of them.