There are two times when Occam’s razor comes to mind. One is for addressing “crazy” ideas ala “The witch down the road did it” and one is for picking which legit seeming hypothesis might I prioritize in some scientific context.
For the first one, I really like Eliezer’s reminder that when going with “The witch did it” you have to include the observed data in your explanation.
For the second one, I’ve been thinking about the simplicity formulation that one of my professors uses. Roughly, A is simpler than B if all data that is consistent with A is a subset of all data that is consistent with B.
His motivation for using this notion has to do with minimizing the number of times you are forced to update.
Roughly, A is simpler than B if all data that is consistent with A is a subset of all data that is consistent with B.
Maybe the less rough version is better, but this seems like a really bad formulation. Consider (a) an exact enumeration of every event that ever happened, making no prediction of the future, vs (b) the true laws of physics and the true initial conditions, correctly predicting every event that ever happened and every event that will happen.
Intuitively, (b) is simpler to specify, and we definitely want to assign (b) a higher prior probability. But according to this formulation, (a) is simpler, since all future events are consistent with (a), while almost none are consistent with (b). Since both theories have equally much evidence, we’d be forced to assign higher probability to (a).
I think me adding more details will clear things up.
The setup presupposes a certain amount of realism. Start with Possible Worlds Semantics, where logical propositions are attached to / refer to the set of possible worlds in which they are true. A hypothesis is some proposition. We think of data as getting some proposition (in practice this is shaped by the methods/tools you have to look at and measure the world), which narrows down the allowable possible worlds consistent with the data.
Now is the part that I think addresses what you were getting at. I don’t think there’s a direct analog in my setup to your (a). You could consider the hypothesis/proposition, “the set of all worlds compatible with the data I have right now”, but that’s not quite the same. I have more thoughts, but first, do you still want feel like you idea is relevant to the setup I’ve described?
That does seem to change things… Although I’m confused about what simplicity is supposed to refer to, now.
In a pure bayesian version of this setup, I think you’d want some simplicity prior over the worlds, and then discard inconsistent worlds and renormalize every time you encounter new data. But you’re not speaking about simplicity of worlds, you’re speaking about simplicity of propositions, right?
Since a propositions is just a set of worlds, I guess you’re speaking about the combined simplicity of all the worlds. And it makes sense that that would increase if the proposition is consistent with more worlds, since any of the worlds would indeed lead to the proposition being true.
So now I’m at “The simplicity of a proposition is proportional to the prior-weighted number of worlds that it’s consistent with”. That’s starting to sound closer, but you seem to be saying that “The simplicity of a proposition is proportional to the number of other propositions that it’s consistent with”? I don’t understand that yet.
(Also, in my formulation we need some other kind of simplicity for the simplicity prior.)
I’m currently turning my notes from this class into some posts, and I’ll wait to continue this until I’m able to get those up. Then, hopefully, it will be easier to see if this notion of simplicity is lacking. I’ll let you know when that’s done.
There are two times when Occam’s razor comes to mind. One is for addressing “crazy” ideas ala “The witch down the road did it” and one is for picking which legit seeming hypothesis might I prioritize in some scientific context.
For the first one, I really like Eliezer’s reminder that when going with “The witch did it” you have to include the observed data in your explanation.
For the second one, I’ve been thinking about the simplicity formulation that one of my professors uses. Roughly, A is simpler than B if all data that is consistent with A is a subset of all data that is consistent with B.
His motivation for using this notion has to do with minimizing the number of times you are forced to update.
Maybe the less rough version is better, but this seems like a really bad formulation. Consider (a) an exact enumeration of every event that ever happened, making no prediction of the future, vs (b) the true laws of physics and the true initial conditions, correctly predicting every event that ever happened and every event that will happen.
Intuitively, (b) is simpler to specify, and we definitely want to assign (b) a higher prior probability. But according to this formulation, (a) is simpler, since all future events are consistent with (a), while almost none are consistent with (b). Since both theories have equally much evidence, we’d be forced to assign higher probability to (a).
I think me adding more details will clear things up.
The setup presupposes a certain amount of realism. Start with Possible Worlds Semantics, where logical propositions are attached to / refer to the set of possible worlds in which they are true. A hypothesis is some proposition. We think of data as getting some proposition (in practice this is shaped by the methods/tools you have to look at and measure the world), which narrows down the allowable possible worlds consistent with the data.
Now is the part that I think addresses what you were getting at. I don’t think there’s a direct analog in my setup to your (a). You could consider the hypothesis/proposition, “the set of all worlds compatible with the data I have right now”, but that’s not quite the same. I have more thoughts, but first, do you still want feel like you idea is relevant to the setup I’ve described?
That does seem to change things… Although I’m confused about what simplicity is supposed to refer to, now.
In a pure bayesian version of this setup, I think you’d want some simplicity prior over the worlds, and then discard inconsistent worlds and renormalize every time you encounter new data. But you’re not speaking about simplicity of worlds, you’re speaking about simplicity of propositions, right?
Since a propositions is just a set of worlds, I guess you’re speaking about the combined simplicity of all the worlds. And it makes sense that that would increase if the proposition is consistent with more worlds, since any of the worlds would indeed lead to the proposition being true.
So now I’m at “The simplicity of a proposition is proportional to the prior-weighted number of worlds that it’s consistent with”. That’s starting to sound closer, but you seem to be saying that “The simplicity of a proposition is proportional to the number of other propositions that it’s consistent with”? I don’t understand that yet.
(Also, in my formulation we need some other kind of simplicity for the simplicity prior.)
I’m currently turning my notes from this class into some posts, and I’ll wait to continue this until I’m able to get those up. Then, hopefully, it will be easier to see if this notion of simplicity is lacking. I’ll let you know when that’s done.