Wha? Then you must order the equivalent theories by running time, not code length. The two are frequently opposed: for example, the fastest known algorithm for matrix multiplication (in the big-O sense) is very complex compared to the naive one. In short, I feel you’re only digging yourself deeper into the hole of heresy.
Well, firstly, who said I cared at all about ‘heresy’? I’m not replying here in order to demonstrate my adherence to the First Church Of Bayes or something...
And while there are, obviously, occasions where ordering by running time and code length are opposed, in general when comparing two arbitrary programs which generate the same output, the longer one will also take longer. This is obvious when you consider it—if you have an arbitrary program X from the space of all programs that generate an output Y, there can only be a finite number of programs that generate that output more quickly. However, there are an infinite number of programs in the sample space that will generate the output more slowly, and that are also longer than X—just keep adding an extra ‘sleep 1’ before it prints the output, to take a trivial example.
In general, the longer the program, the more operations it performs and the longer it takes, when you’re sampling from the space of all possible programs. So while run time and code length aren’t perfectly correlated, they’re a very decent proxy for each other.
if you have an arbitrary program X from the space of all programs that generate an output Y, there can only be a finite number of programs that generate that output more quickly.
Amusingly, this statement is false. If a program Z is faster than X, then there exist infinitely many versions of Z that also run faster than X: just add some never-executed code under an if(false) branch. I’m not sure whether your overall argument can be salvaged.
You’re quite correct, there. I was only including code paths that can ever actually be executed, in the same way I wouldn’t count comments as part of the program. This seems to me to be the correct thing to do, and I believe one could come up with some more rigorous reasoning along the lines of my previous comment, but I’m too tired right now to do so. I’ll think about this...
I was only including code paths that can ever actually be executed...
Wouldn’t a meta-algorithm that determines which paths are executable in a given algorithm necessarily not be able to do so for every possible algorithm unless it was functionally equivalent to a halting oracle?
I’m not sure how problematic this is to your idea, but it’s one advantage that the simpler system of just counting total lines has.
I think there’s a difference between looking at a theory as data versus looking at it as code.
You look at a theory as code when you need to use the theory to predict the future of something it describes. (E.g., will it rain.) For this purpose, theories that generate the same predictions are equivalent, you don’t care about their size. In fact, even theories with different predictions can be considered equivalent, as long as their predictions are close enough for your purpose. (See Newtonian vs. relativistic physics applied to predicting kitchen-sink performance.) You do care about how fast you can run them, though.
However, you look at a theory as data when you need to reason about theories, and “make predictions” about them, particularly unknown theories related to known ones. As long as two theories make exactly the same predictions, you don’t have much reason to reason about them. However, if they predict differently for something you haven’t tested yet, but will test in the future, and you need to take an action now that has different outcomes depending on the result of the future test (simple example: a bet), then you need to try to guess which is more likely.
You need something like a meta-theory that predicts which of the two is more likely to be true. Occam’s razor is one of those meta-theories.
Thinking about it more, this isn’t quite a disagreement to the post immediately above; it’s not immediately obvious to me that a simpler theory is easier to reason about (though intuition says it should be). But I don’t think Occam’s razor is about how easy it is to reason about theories, it just claims simpler ones are more likely. (Although one could justify it like this: take an incomplete theory; add one detail; add another detail; on each step you have to pick between many details you might add, so the more details you add you’re more likely to pick the wrong one (remember you haven’t tested the successive theories yet); thus, the more complex your theory the likelier you are to be wrong.)
Wha? Then you must order the equivalent theories by running time, not code length. The two are frequently opposed: for example, the fastest known algorithm for matrix multiplication (in the big-O sense) is very complex compared to the naive one. In short, I feel you’re only digging yourself deeper into the hole of heresy.
Well, firstly, who said I cared at all about ‘heresy’? I’m not replying here in order to demonstrate my adherence to the First Church Of Bayes or something...
And while there are, obviously, occasions where ordering by running time and code length are opposed, in general when comparing two arbitrary programs which generate the same output, the longer one will also take longer. This is obvious when you consider it—if you have an arbitrary program X from the space of all programs that generate an output Y, there can only be a finite number of programs that generate that output more quickly. However, there are an infinite number of programs in the sample space that will generate the output more slowly, and that are also longer than X—just keep adding an extra ‘sleep 1’ before it prints the output, to take a trivial example.
In general, the longer the program, the more operations it performs and the longer it takes, when you’re sampling from the space of all possible programs. So while run time and code length aren’t perfectly correlated, they’re a very decent proxy for each other.
Amusingly, this statement is false. If a program Z is faster than X, then there exist infinitely many versions of Z that also run faster than X: just add some never-executed code under an if(false) branch. I’m not sure whether your overall argument can be salvaged.
You’re quite correct, there. I was only including code paths that can ever actually be executed, in the same way I wouldn’t count comments as part of the program. This seems to me to be the correct thing to do, and I believe one could come up with some more rigorous reasoning along the lines of my previous comment, but I’m too tired right now to do so. I’ll think about this...
Wouldn’t a meta-algorithm that determines which paths are executable in a given algorithm necessarily not be able to do so for every possible algorithm unless it was functionally equivalent to a halting oracle?
I’m not sure how problematic this is to your idea, but it’s one advantage that the simpler system of just counting total lines has.
I think there’s a difference between looking at a theory as data versus looking at it as code.
You look at a theory as code when you need to use the theory to predict the future of something it describes. (E.g., will it rain.) For this purpose, theories that generate the same predictions are equivalent, you don’t care about their size. In fact, even theories with different predictions can be considered equivalent, as long as their predictions are close enough for your purpose. (See Newtonian vs. relativistic physics applied to predicting kitchen-sink performance.) You do care about how fast you can run them, though.
However, you look at a theory as data when you need to reason about theories, and “make predictions” about them, particularly unknown theories related to known ones. As long as two theories make exactly the same predictions, you don’t have much reason to reason about them. However, if they predict differently for something you haven’t tested yet, but will test in the future, and you need to take an action now that has different outcomes depending on the result of the future test (simple example: a bet), then you need to try to guess which is more likely.
You need something like a meta-theory that predicts which of the two is more likely to be true. Occam’s razor is one of those meta-theories.
Thinking about it more, this isn’t quite a disagreement to the post immediately above; it’s not immediately obvious to me that a simpler theory is easier to reason about (though intuition says it should be). But I don’t think Occam’s razor is about how easy it is to reason about theories, it just claims simpler ones are more likely. (Although one could justify it like this: take an incomplete theory; add one detail; add another detail; on each step you have to pick between many details you might add, so the more details you add you’re more likely to pick the wrong one (remember you haven’t tested the successive theories yet); thus, the more complex your theory the likelier you are to be wrong.)