Using only speed to evaluate models lands you with a lookup table that stores the results you want. So you have to trade off speed and length: The speediest table of math results has length O(N) and speed O(N) (maybe? Not at all sure). The shortest general program has length O(log(N)) and uncomputably fast-growing time. So if we think of a separable cost function F(length)+G(time), as long as F doesn’t grow super-fast nor G super-slow, eventually the lookup table will have better score than brute-force search.
You’re Manfred! Huh. I noticed you making an unusual amount of sense in some comment threads, but didn’t make the connection. Great to meet you again!
Using a combination of length and time sounds right. The speed prior uses length+log(time), I don’t know if it’s the best choice but it seems okay?
That said, I don’t quite understand your post. The universal prior is just a prior, not any particular approximation scheme. Sure, it’s equivalent to a mixture of all programs that predict bits exactly, but it’s also equivalent to a mixture of all programs that compute probability distributions over bits. (That fact is surprising and nontrivial, but true. It’s in the Li and Vitanyi book.) So you can approximate it on noisy data just fine.
Yeah, I know that I’m somewhat sloppily identifying the priors with the programs that make up most of their weight. It makes it more convenient for me to think about what agents using those priors would do—though I am probably missing some details that would stem from using a computable approximation rather than the possibly uncomputable one of looking at the most successful few programs.
And it serves me right for not googling “speed prior.” I forgot it was length+log(time). I’m sure computational complexity people would have much more interesting things to say than me about why that might get you plenty of mathematics programs that are neither brute force nor lookup tables. Or maybe it’s just that taking the log of time to turn it into “bits” is the number one obvious thing to do.
If we think of the Solomonoff prior as the “real answer” and a speed-like prior as a convenience used for making computable predictions when computation has a cost, we could get some kind of principled answer from estimates of the benefit of predictive accuracy and the cost of time. I spent a few minutes and couldn’t figure it out—worst case reasoning about predictive accuracy gets stuck on some very rare worst cases, and I’m not sure how to do the average case reasoning right.
Using only speed to evaluate models lands you with a lookup table that stores the results you want. So you have to trade off speed and length: The speediest table of math results has length O(N) and speed O(N) (maybe? Not at all sure). The shortest general program has length O(log(N)) and uncomputably fast-growing time. So if we think of a separable cost function F(length)+G(time), as long as F doesn’t grow super-fast nor G super-slow, eventually the lookup table will have better score than brute-force search.
Ideally you want to find some happy medium—this is reminding my of my old post on approximate induction.
You’re Manfred! Huh. I noticed you making an unusual amount of sense in some comment threads, but didn’t make the connection. Great to meet you again!
Using a combination of length and time sounds right. The speed prior uses length+log(time), I don’t know if it’s the best choice but it seems okay?
That said, I don’t quite understand your post. The universal prior is just a prior, not any particular approximation scheme. Sure, it’s equivalent to a mixture of all programs that predict bits exactly, but it’s also equivalent to a mixture of all programs that compute probability distributions over bits. (That fact is surprising and nontrivial, but true. It’s in the Li and Vitanyi book.) So you can approximate it on noisy data just fine.
Nice to meet you again too :)
Yeah, I know that I’m somewhat sloppily identifying the priors with the programs that make up most of their weight. It makes it more convenient for me to think about what agents using those priors would do—though I am probably missing some details that would stem from using a computable approximation rather than the possibly uncomputable one of looking at the most successful few programs.
And it serves me right for not googling “speed prior.” I forgot it was length+log(time). I’m sure computational complexity people would have much more interesting things to say than me about why that might get you plenty of mathematics programs that are neither brute force nor lookup tables. Or maybe it’s just that taking the log of time to turn it into “bits” is the number one obvious thing to do.
If we think of the Solomonoff prior as the “real answer” and a speed-like prior as a convenience used for making computable predictions when computation has a cost, we could get some kind of principled answer from estimates of the benefit of predictive accuracy and the cost of time. I spent a few minutes and couldn’t figure it out—worst case reasoning about predictive accuracy gets stuck on some very rare worst cases, and I’m not sure how to do the average case reasoning right.