Shane_Legg comments on Brief Break

Shane_Legg 31 Aug 2008 23:37 UTC
3 points
Eli:

Thanks. That clears things up. Enjoy your break. Maybe you should not post quite so much? You really do seem to be writing rather a lot these days. By the time I get to replying to some of your comments you’ve already written another 5 posts!

Tim:

Answering this question starts to feel a bit like living in the movie Groundhog Day. :-)

Usually the reference machine is taken to have a low state x symbol complexity, so you can’t hide much in it. In other words the reference machine has to be in some sense simple.

Now look at the Kolmogorov complexity function. As you mention, if somebody else uses a different reference machine their measured Kolmogorov complexity will be different, where the maximal difference is bounded by some constant. How big is this bound? Pretty small. Many types of simple Turing machines have been shown to be able to simulate each other with a few hundred bits of input. It’s also trivial for a serial machine to simulate a parallel machine. Remember that Kolmogorov complexity is a measure of information content, in the sense of shortest description. It’s not a measure of how much computation was performed. There are other measures for that which are more complex...

Finally, look at the Solomonoff bound (bottom of page 38). There you see the Kolmogorov complexity of the true model of the environment. If you’re using a different simple reference machine, this bound might go up by a few hundred bits. Is this a big deal? Well, yes: if you are using only a few bytes of input data, and that’s all. But then Bayesian inference in general will have problems as your prior will strongly affect your posterior. The reference machine problem is the same issue, but in different language. However, what if you have more data, say a few kB or more, maybe much much more? In this case taking a few bytes longer to converge isn’t really a bit deal. Especially considering that this is convergence for any unknown computable hypothesis. Say you want to predict the stock market, or results from particle physics experiments, or sentences in a book. Are a few bytes of extra data for the convergence bound going to make much difference? Not really. The Solomonoff predictor is still going to kick some serious butt.

Of course it’s not computable, has to be approximated in practice… etc. etc. So why bother with all this? I see it as a kind of “mathematical philosophy”. You take ideas about induction, learning, computation etc. and really nail them down hard in formal mathematics and then study what you’ve got. I think this gives you some insights into the nature of learning, intelligence etc. Of course, this is a rather subjective point. My own AGI project that I’m developing with some of my research buddies isn’t directly based on Solomonoff Induction and AIXI, but we do draw on some related works (such as the universal intelligence measure suitably computably interpreted) and I do sometimes use AIXI as a kind of mental framework to think about some kinds of AGI design issues.