We can sort of see what might cause someone to change their views on what their generic priors should look like by looking at semi-historical examples.
Early on in the theory of computable functions it seemed like all computable functions might be primitive recursive. Presumably, if one didn’t know about things llike the Ackermann function, you’d have no issue phrasing a general prior in terms of primitive recursive functions.
Another example is actually in your text, where people realized that quantum systems seemed to be able to do things that non-quantum systems could not (within certain time bounds).
Thus. updating our general framework seems to occur when aspects of the universe around us suggest that modeling them requires a larger class of functions than we anticipated. In the case of primitive recursive functions, it turned out that empirically the human brain could calculate a specific function that wasn’t primitive recursive.
For what it is worth, I don’t share your confidence that priors won’t need to be drastically re-examined. One issue that Eliezer and others have observed with the Solomonoff prior is that it assigns equal probability to different ideas regardless of the computational time. While privileging polynomial time descriptions might help, it isn’t clear how one should distinguish between two Turing machines, one which runs in very short time (say degree 2) and another that is long (say degree 20) but the degree 2 has a much smaller number of states. Which one of those is simpler?
On the problem of distinguishing between Turing machines of the kinds you mentioned, does Jürgen Schmidhuber’s idea of a speed prior help at all? Searching for “speed prior” here on Less Wrong didn’t really turn up any previous discussion.
While privileging polynomial time descriptions might help, it isn’t clear how one should distinguish between two Turing machines, one which runs in very short time (say degree 2) and another that is long (say degree 20) but the degree 2 has a much smaller number of states. Which one of those is simpler?
I have given one answer in the post, selected somewhat arbitrarily (but with the desirable property of working more or less correctly for physical theories we have seen so far). I think basing things on circuits rather than TM’s is clearly a better start.
For one thing, I don’t know what “degree 2” means for you. Is that in terms of the size of the universe? The # of the current observation? Both of these have significant problems (the first fails if the universe gets padded with junk, the second fails if your observations don’t begin at the beginning of time).
For a second thing, Turing machines are a horrible way to measure the degree of polynomial run-times. There is basically no reason to believe that you get out a reasonable thing, unlike with circuits. Of course, using circuits introduces other issues, which require new solutions. I suggest one.
Of course, the post just contains my best try. I would be happy to see a better go at it, but I would be very surprised if it was based on any inherently uniform model of computation which is known today.
We can sort of see what might cause someone to change their views on what their generic priors should look like by looking at semi-historical examples.
Early on in the theory of computable functions it seemed like all computable functions might be primitive recursive. Presumably, if one didn’t know about things llike the Ackermann function, you’d have no issue phrasing a general prior in terms of primitive recursive functions.
Another example is actually in your text, where people realized that quantum systems seemed to be able to do things that non-quantum systems could not (within certain time bounds).
Thus. updating our general framework seems to occur when aspects of the universe around us suggest that modeling them requires a larger class of functions than we anticipated. In the case of primitive recursive functions, it turned out that empirically the human brain could calculate a specific function that wasn’t primitive recursive.
For what it is worth, I don’t share your confidence that priors won’t need to be drastically re-examined. One issue that Eliezer and others have observed with the Solomonoff prior is that it assigns equal probability to different ideas regardless of the computational time. While privileging polynomial time descriptions might help, it isn’t clear how one should distinguish between two Turing machines, one which runs in very short time (say degree 2) and another that is long (say degree 20) but the degree 2 has a much smaller number of states. Which one of those is simpler?
On the problem of distinguishing between Turing machines of the kinds you mentioned, does Jürgen Schmidhuber’s idea of a speed prior help at all? Searching for “speed prior” here on Less Wrong didn’t really turn up any previous discussion.
I discuss that concept here: http://alife.co.uk/essays/the_one_true_razor/
Hmm, I had not seen the speed prior before. It seems to make strong testable predictions about how the universe functions. I’ll have to look into it.
I have given one answer in the post, selected somewhat arbitrarily (but with the desirable property of working more or less correctly for physical theories we have seen so far). I think basing things on circuits rather than TM’s is clearly a better start.
For one thing, I don’t know what “degree 2” means for you. Is that in terms of the size of the universe? The # of the current observation? Both of these have significant problems (the first fails if the universe gets padded with junk, the second fails if your observations don’t begin at the beginning of time).
For a second thing, Turing machines are a horrible way to measure the degree of polynomial run-times. There is basically no reason to believe that you get out a reasonable thing, unlike with circuits. Of course, using circuits introduces other issues, which require new solutions. I suggest one.
Of course, the post just contains my best try. I would be happy to see a better go at it, but I would be very surprised if it was based on any inherently uniform model of computation which is known today.
I was thinking in terms of number of observations. I agree that this has problems.