Maybe start by showing how it works to predict a sequence like 010101..., then something more complicated like 011011… It starts to get interesting with a sequence like 01011011101111… - how long would it take to converge on the right model there? (which is that the subsequence of 1s is one bit longer each time).
True Solomonoff induction is uselessly slow. The relationship of Solomonoff induction to actual induction is like the relationship of Principia Mathematica to a pocket calculator; you don’t use Russell and Whitehead’s methods and notation to do practical arithmetic. Solomonoff induction is a brute-force scan of the whole space of possible computational models for the best fit. Actual induction tends to start apriori with a very narrow class of possible hypotheses, and only branches out to more elaborate ones if that doesn’t work.
Maybe start by showing how it works to predict a sequence like 010101..., then something more complicated like 011011… It starts to get interesting with a sequence like 01011011101111… -
This would be a derivation of F=ma, vs all other possible laws. I am not asking for that. My question is supposedly much simpler: write a binary string corresponding to just one model out of infinitely many, namely F=ma.
If we adopt the paradigm in the article—a totally passive predictor, which just receives a stream of data and makes a causal model of what’s producing the data—then “F=ma” can only be part of the model. The model will also have to posit particular forces, and particular objects with particular masses.
Suppose the input consists of time series for the positions in three dimensions of hundreds of point objects interacting according to Newtonian gravity. I’m sure you can imagine what a program capable of generating such output looks like; you may even have written such a program. If the predictor does its job, then its model of the data source should also be such a program, but encoded in a form readable by a UTM (or whatever computational system we use).
Such a model would possibly have a subroutine which computed the gravitational force exerted by one object on another, and then another subroutine which computed the change in the position of an object during one timestep, as a function of all the forces acting on it. “F=ma” would be implicit in the second subroutine.
Solomonoff induction accounts for all of the data, which is a binary sequence of sensory-level happenings. In its hypothesis, there would have to be some sub-routine that extracted objects from the sensory data, one that extracted a mass from these objects, et cetera. The actual F=ma part would be a really far abstraction, though it would still be a binary sequence.
Solomonoff induction can’t really deal with counterfactuals in the same way that a typical scientific theory can. That is, we can say, “What if Jupiter was twice as close?” and then calculate it. With Solomonoff induction, we’d have to understand the big binary sequence hypothesis and isolate the high-level parts of the program to use just those to calculate the consequence of counterfactuals.
Maybe start by showing how it works to predict a sequence like 010101..., then something more complicated like 011011… It starts to get interesting with a sequence like 01011011101111… - how long would it take to converge on the right model there? (which is that the subsequence of 1s is one bit longer each time).
True Solomonoff induction is uselessly slow. The relationship of Solomonoff induction to actual induction is like the relationship of Principia Mathematica to a pocket calculator; you don’t use Russell and Whitehead’s methods and notation to do practical arithmetic. Solomonoff induction is a brute-force scan of the whole space of possible computational models for the best fit. Actual induction tends to start apriori with a very narrow class of possible hypotheses, and only branches out to more elaborate ones if that doesn’t work.
This would be a derivation of F=ma, vs all other possible laws. I am not asking for that. My question is supposedly much simpler: write a binary string corresponding to just one model out of infinitely many, namely F=ma.
If we adopt the paradigm in the article—a totally passive predictor, which just receives a stream of data and makes a causal model of what’s producing the data—then “F=ma” can only be part of the model. The model will also have to posit particular forces, and particular objects with particular masses.
Suppose the input consists of time series for the positions in three dimensions of hundreds of point objects interacting according to Newtonian gravity. I’m sure you can imagine what a program capable of generating such output looks like; you may even have written such a program. If the predictor does its job, then its model of the data source should also be such a program, but encoded in a form readable by a UTM (or whatever computational system we use).
Such a model would possibly have a subroutine which computed the gravitational force exerted by one object on another, and then another subroutine which computed the change in the position of an object during one timestep, as a function of all the forces acting on it. “F=ma” would be implicit in the second subroutine.
This is correct.
Solomonoff induction accounts for all of the data, which is a binary sequence of sensory-level happenings. In its hypothesis, there would have to be some sub-routine that extracted objects from the sensory data, one that extracted a mass from these objects, et cetera. The actual F=ma part would be a really far abstraction, though it would still be a binary sequence.
Solomonoff induction can’t really deal with counterfactuals in the same way that a typical scientific theory can. That is, we can say, “What if Jupiter was twice as close?” and then calculate it. With Solomonoff induction, we’d have to understand the big binary sequence hypothesis and isolate the high-level parts of the program to use just those to calculate the consequence of counterfactuals.