I think there’s a difference between looking at a theory as data versus looking at it as code.
You look at a theory as code when you need to use the theory to predict the future of something it describes. (E.g., will it rain.) For this purpose, theories that generate the same predictions are equivalent, you don’t care about their size. In fact, even theories with different predictions can be considered equivalent, as long as their predictions are close enough for your purpose. (See Newtonian vs. relativistic physics applied to predicting kitchen-sink performance.) You do care about how fast you can run them, though.
However, you look at a theory as data when you need to reason about theories, and “make predictions” about them, particularly unknown theories related to known ones. As long as two theories make exactly the same predictions, you don’t have much reason to reason about them. However, if they predict differently for something you haven’t tested yet, but will test in the future, and you need to take an action now that has different outcomes depending on the result of the future test (simple example: a bet), then you need to try to guess which is more likely.
You need something like a meta-theory that predicts which of the two is more likely to be true. Occam’s razor is one of those meta-theories.
Thinking about it more, this isn’t quite a disagreement to the post immediately above; it’s not immediately obvious to me that a simpler theory is easier to reason about (though intuition says it should be). But I don’t think Occam’s razor is about how easy it is to reason about theories, it just claims simpler ones are more likely. (Although one could justify it like this: take an incomplete theory; add one detail; add another detail; on each step you have to pick between many details you might add, so the more details you add you’re more likely to pick the wrong one (remember you haven’t tested the successive theories yet); thus, the more complex your theory the likelier you are to be wrong.)
I think there’s a difference between looking at a theory as data versus looking at it as code.
You look at a theory as code when you need to use the theory to predict the future of something it describes. (E.g., will it rain.) For this purpose, theories that generate the same predictions are equivalent, you don’t care about their size. In fact, even theories with different predictions can be considered equivalent, as long as their predictions are close enough for your purpose. (See Newtonian vs. relativistic physics applied to predicting kitchen-sink performance.) You do care about how fast you can run them, though.
However, you look at a theory as data when you need to reason about theories, and “make predictions” about them, particularly unknown theories related to known ones. As long as two theories make exactly the same predictions, you don’t have much reason to reason about them. However, if they predict differently for something you haven’t tested yet, but will test in the future, and you need to take an action now that has different outcomes depending on the result of the future test (simple example: a bet), then you need to try to guess which is more likely.
You need something like a meta-theory that predicts which of the two is more likely to be true. Occam’s razor is one of those meta-theories.
Thinking about it more, this isn’t quite a disagreement to the post immediately above; it’s not immediately obvious to me that a simpler theory is easier to reason about (though intuition says it should be). But I don’t think Occam’s razor is about how easy it is to reason about theories, it just claims simpler ones are more likely. (Although one could justify it like this: take an incomplete theory; add one detail; add another detail; on each step you have to pick between many details you might add, so the more details you add you’re more likely to pick the wrong one (remember you haven’t tested the successive theories yet); thus, the more complex your theory the likelier you are to be wrong.)