Your proof can’t be right because it doesn’t use the concept of “complexity” in any non-trivial manner. If we replace “complexity” with “flubbity”, so that there’s only a finite number of hypotheses for any given “flubbity”, your proof will still go through.
For some actual work on justifying Occam’s Razor, see Kevin T. Kelly’s Ockham Efficiency results. Kelly unpacks “complexity” in a nontrivial way that isn’t just description length.
ETA: see this earlier discussion about learning and simplicity, it’s very relevant to the topic of your interest.
Consider a creature with a prior of the type described in this post. Then there is some concept of “foobity”, such that hypotheses with higher “foobity” are assigned smaller weights. The creature will find that it follows Occam’s Razor, if it does not have a separate concept of “complexity” such that “complexity” != “foobity”. But why would it, unless there was some reason for that to be the case? “Typically” there would be no reason for the creature to have an explicit concept of complexity that’s different from the concept of complexity implicit in its prior.
Assuming that’s the case, its concept of complexity may still seem very strange and convoluted to us or to some other creature, but to that creature it will appear to be perfectly natural, since there’s nothing else to judge it by.
Could we just be such a creature? Intuitively, the answer seems to be no. Our concept of complexity seems to be natural in some absolute, objective sense, and not just relative to itself. But why is it so hard to pin that down?
Circular/anthropic arguments are seductive, but invariably turn out to be flawed because they predict less order and more narrowly-averted chaotic doom than we actually observe. Compared to our value system, which is genuinely a product of many evolutionary accidents, our concept of complexity is too simple because it can be captured (albeit imperfectly) by Turing machines. In other words, a creature using a randomly-evolved concept of “foobity” wouldn’t be able to approach it with simple math, as we do.
I think i’ts a mistake to reach for the anthropic hammer whenever you see a confusing problem. The most extreme example was when Nesov produced a “proof” that particle physics is a Darwinian construct, after which (thankfully) the absurdity heuristic finally kicked in.
Re ETA: Nesov said that particle physics are this way because you only care about the worlds where they are this way. Just like your explanation of probabilities. :-)
A correction (though I mixed that up in comments too): what we anticipate is not necessarily linked to what we care about. Particle physics is this way because we anticipate worlds in which it’s this way, but we may well care about other worlds in which it isn’t.
Anticipation is about what we can control (as evolution saw the possibility, based on the past in the same world), not what we want to happen. Since evolution is causal, we don’t anticipate acausal control, but we can care about acausal control.
The useful conclusion seems to be that the concept of anticipation (and hence, reality/particle physics) is not fundamental in the decision theory sense, it’s more like the concept of hunger: something we can feel, can have accurate theories about, but doesn’t answer questions about the nature of goodness.
Don’t know about you, but I anticipate acausal control, to a degree. I have a draft post titled “Taking UDT Seriously” featuring such shining examples as: if a bully attacks you, you should try to do maximum damage while disregarding any harm to yourself, because it’s good for you to be predicted as such a person. UDT is seriously scary when applied to daily life, even without superintelligences.
I have a draft post titled “Taking UDT Seriously” featuring such shining examples as: if a bully attacks you, you should try to do maximum damage while disregarding any harm to yourself, because it’s good for you to be predicted as such a person.
I don’t think UDT (or a variant of UDT that applies to humans that nobody has really formulated yet, because the original UDT assumed that one has access to one’s source code) implies this, because the difference between P(bully predicts me as causing a lot of damaged | I try to cause maximum damage) and P(bully predicts me as causing a lot of damaged | I don’t try to cause maximum damage) seems quite small (because the bully can’t see or predict my source code and also can’t do a very good job of simulating or predicting my decisions), while the negative consequences of trying to cause maximum damage seems quite high if the bully fails to be preemptively dissuaded (e.g., being arrested or sued or disciplined or retaliated against).
(Not sure if you still endorse this comment, 9 years later, but I sometimes see what I consider to be overly enthusiastic applications of UDT, and as the person most associated with UDT I feel an obligation to push against that.)
You seem to be mixing up ambient control within a single possible world with assignment of probability measure to the set of possible worlds (which anticipation is all about). You control the bully by being expected (credibly threatening) to retaliate within a single possible world. Acausal control is about controlling one possible world from another, while ambient (logical) control is about deciding the way your possible world will turn out (what you discussed in the recent posts).
More generally, logical control can be used to determine an arbitrary concept, including that of utility of all possible worlds considered together, or of all mathematical structures. Acausal control is just a specific way in which logical control can happen.
Yep. I can’t seem to memorize the correct use of our new terminology (acausal/ambient/logical/etc), so I just use “acausal” as an informal umbrella term for all kinds of winning behavior that don’t seem to be recommended by CDT from the agent’s narrow point of view. Like one-boxing in Newcomb’s Problem, or being ready to fight in order to release yet-undiscovered pheromones or something.
Sorry, I misparsed your comment and gave a wrong answer, which I then deleted.
Your original comment was trivially correct, and my reply missed the point. We can never justify our concept of complexity by thinking like that—linguistically—because this would be like trying to justify our prior with our prior, “a priori”. If my prior is based on complexity and Bob’s prior is based on foobity (religion, or whatever), we will find each other’s priors weird. So if you ask whether all imaginable creatures have to use our concept of complexity, the easy answer is no. Instead we look at the outside world and note that our brand of razor seems to work. When it doesn’t (religion, or whatever), we update it. Is there any other aspect to your question that I missed?
Let’s call our brand of razor together with the algorithm we use to update it (using what we see from the outside world) our “meta-razor”. Now is this “meta-razor” just a kind of “foobity”, i.e., an arbitrary notion that we just happen to have, or is there something objective about it?
I spent some time thinking about your question and cannot give an answer until I understand better what you mean by objective vs arbitrary.
The concept of complexity looks objective enough in the mathematical sense. Then, if I understand you correctly, you take a step back and say that mathematics itself (including logic, I presume?) is a random concept, so other beings could have wildly different “foomatics” that they find completely clear and intuitive. With the standards thus raised, what kind of argument could ever show you that something is “objective”? This isn’t even the problem of induction, this is… I’m at a loss for words. Why do you even bother with Tegmark’s multiverse then? Why not say instead that “existence” is a random insular human concept, and our crystalloid friends could have a completely different concept of “fooistence”? Where’s the ground floor?
Here’s a question to condense the issue somewhat. What do you think about Bayesian updating? Is it “objective” enough?
Perhaps asking that question wasn’t the best way to make my point. Let me try to be more explicit. Intuitively, “complexity” seems to be an absolute, objective concept. But all of the formalizations we have of it so far contain a relativized core. In Bayesian updating, it’s the prior. In Kolmogorov complexity, it’s the universal Turing machine. If we use “simple math”, it would be the language we use to talk about math.
This failure to pin down an objective notion of complexity causes me to question the intuition that complexity is objective. I’d probably change my mind if someone came up with a “reasonable” formalization that’s not “relative to something.”
Implementable on a machine during my lifetime. That’s got to be an objective property, at least I’m wondering how you will spin it to sound subjective :-)
Edit: whoops, sorry, this is wrong. Don’t bother answering.
The fact that the proof uses only very weak properties of the notion of “complexity” does not show that the proof is invalid. In fact, it suggests (not with certainty) the opposite: that an even stronger result could be proven.
Thanks and apologies, I was being careless when I said the proof “couldn’t be right”. It may be formally right, but fail to explain why we should use Occam’s Razor instead of the Flubby Razor.
Yes a statement like “on average, X holds” needs to be quantitative in order to be useful in practice. Still, this is a valuable argument, especially as far as it clarifies what needs to be done to reach a quantitative refinement.
The proof does allow for different Razors depending on your specific definition of complexity. It’s true that this makes the statement fairly weak. But it is also in a way a virtue, because it corresponds with our actual use of the Razor. There are in fact different definitions of complexity and we use different ones in different contexts; we learn from experience which contexts require which Razors. For example, if you want to describe the path of a ball moving through the air, you expect to be able to describe it with some fairly simple mathematics, and you think this more likely than complicated mathematical descriptions.
On the other hand, if you see a precise human-shaped footprint in the mud in your yard, saying “a human did it” is very complicated mathematically, since the mathematical description would include the description of a human being. So in this way, “Wind and weather did it” should be simpler, in the mathematical way. Nonetheless, you use a different Razor and say that with this Razor, it is simpler to say that a human being did it.
The last paragraph feels wrong to me. This is the explanation for that feeling that came to mind first, but I’m not positive it’s my true objection:
If your only observation were the footprint, sure—but your observations also include that you live on a planet with 7 billion people who leave footprints like that—so despite the comparitive algorithmic simplicity of the weather model, it doesn’t match the observations as well.
Yes, it is likely that a stronger result is possible. However, it is difficult to make it stronger while satisfying my conditions (true in all possible worlds; true according to every logically consistent assignment of priors).
Interestingly, flubbity will correlate with complexity, regardless of how you define it. This is for pretty much the same reason as the inverse correlation of complexity and probability.
I don’t understand your objection. Yes, it will apply to other attributes as well. I don’t see how that prevents it from applying to complexity.
If the objection is that I didn’t describe complexity in detail, this doesn’t matter, precisely because the proof will still go through regardless of how much or little is added.
On average you should choose the simpler of two hypotheses compared according to simplicity, and again, on average you should choose the less flubby of two hypotheses compared according to flubbiness.
Or to put this in more real terms: the Razor will be true if it is understood to mean that you should choose the hypothesis that can be described by the shorter program, and it will also be true that you should choose the hypothesis that can be described by a shorter English description.
Yes, these can on particular occasions be opposed to one another, and you would have to ask which rule is better: choose the shorter English description, or choose the shorter program? My proof does not answer this question, but it doesn’t have to, because both rules measure some kind of complexity, and the Razor is true whether it is taken in one way or the other.
Flubbity may not have much to do with complexity. In fact it can be opposed to complexity, except in the limit for extremely complex/flubby hypotheses. For example, you may say that flubbity=1000000-complexity for complexity<1000000, and flubbity=complexity elsewhere. Your proof will go through just fine, but in our world (which probably doesn’t need such huge hypotheses) it will lead to the opposite of Occam’s Razor. You don’t always have the luxury of letting your parameter go to infinity.
The proof follows on average. Naturally you can construct artificial examples that make fun of it, but there can be no proof of the Razor which is not based on averages, since in fact, it happens on various occasions that the more complex hypothesis is more correct than the simpler hypothesis.
I don’t object to the formal correctness of the proof, but the statement it proves is way too weak. Ideally we’d want something that works for complexity but not flubbity. For any Occamian prior you care to build, I can take the first few hypotheses that comprise 99% of its weight, build a new prior that assigns them a weight of 1e-20 combined, and claim it’s just as good as yours by Occamian lights.
If we removed the words “on average” from the formulation of your theorem, we’d have a stronger and more useful statement. Kelly’s work shows an approach to proving it not just “on average”, but for all possible hypothesis lengths.
ETA: I apologize for not objecting to the formal side of things. I just read the proof once again and failed to understand what it even means by “on average”.
I started reading some of Kelly’s work, and it isn’t trying to prove that the less complex hypothesis is more likely to be true, but that by starting from it you converge on the truth more quickly. I’m sure this is right but it isn’t what I was looking for.
Yes, the statement is weak. But this is partly because I wanted a proof which would be 1) valid in all possible worlds ; 2) valid according to every logically consistent assignment of priors. It may be that even with these conditions, a stronger proof is possible. But I’m skeptical that a much stronger proof is possible, because it seems to be logically consistent for someone to say that he assigns a probability of 99% to a hypothesis that has a complexity of 1,000,000, and distributes the remaining 1% among the remaining hypotheses.
This is also why I said “on average.” I couldn’t remove the words “on average” and assert that a more complex statement is always less probable without imposing a condition on the choice of prior which does not seem to be logically necessary. The meaning of “on average” in the statement of the Razor is that in the limit, as the complexity tends to infinity, the probability necessarily tends to zero; given any probability x, say 0.000001 or whatever , there will be some complexity value z such that all statements equal or greater than that complexity value z have a probability less than x.
Why do you want the theorem to hold for every logically consistent prior? This looks backwards. Occamian reasoning should show why some prior distributions work better than others, not say they’re all equally good. For example, the Solomonoff prior is one possible formalization of Occam’s Razor.
Because for every logically consistent prior, there should be a logically possible world where that prior works well. If there isn’t, and you can prove this to me, then I would exclude priors that don’t work well in any possible world.
I want it to apply to every possible world because if we understand the Razor in such a way that it doesn’t apply in every possible world, then the fact that Razor works well is a contingent fact. If this is the case there can’t be any conclusive proof of it, nor does it seem that there can be any ultimate reason why the Razor works well except “we happen to be in one of the possible worlds where it works well.” Yes, there could be many interpretations which are more practical in our actual world, but I was more interested in an interpretation which is necessary in principle.
This is even more backwards. There are logically possible worlds where an overseer god punishes everyone who uses Bayesian updating. Does this mean we should stop doing science? Looking for “non-contingent” facts and “ultimate” reasons strikes me as a very unfruitful area of research.
My point is that if someone has a higher greater for the more complex hypothesis which turns out to be correct, you cannot object to his prior, saying “How did you know that you should use a higher prior,” since people do not justify their priors. Otherwise they wouldn’t be priors.
A major use (if not the whole point,) of occam’s razor is to have a rational basis for priors.
If people don’t have to justify their priors, then why have a process for generating them at all?
If I create an encoding with ‘God’ as a low complexity explanation, would you say I am being rational?
But the point of my question above was that you find out that the more complex hypothesis is correct when you get evidence for it. Juggling your priors is not the way to do it. (in fact it probably invites accidentally counting evidence twice.
This is spot-on. Furthermore, such a lax definition indicates that certain hypotheses will have “probability” zero. If our language is English, the explanation, “PUJF;FDAS!;FDS?” could be assigned probability zero. While this does not guarantee that the set of possible explanations is finite, neither does the author prove that the set of nonzero possibilities is infinite, and if it is not this proof is largely useless.
Also, interestingly, the rules do not address the, “A wizard did it!” problem with complexity, though that is likely far beyond the attempted scope.
Your proof can’t be right because it doesn’t use the concept of “complexity” in any non-trivial manner. If we replace “complexity” with “flubbity”, so that there’s only a finite number of hypotheses for any given “flubbity”, your proof will still go through.
For some actual work on justifying Occam’s Razor, see Kevin T. Kelly’s Ockham Efficiency results. Kelly unpacks “complexity” in a nontrivial way that isn’t just description length.
ETA: see this earlier discussion about learning and simplicity, it’s very relevant to the topic of your interest.
Consider a creature with a prior of the type described in this post. Then there is some concept of “foobity”, such that hypotheses with higher “foobity” are assigned smaller weights. The creature will find that it follows Occam’s Razor, if it does not have a separate concept of “complexity” such that “complexity” != “foobity”. But why would it, unless there was some reason for that to be the case? “Typically” there would be no reason for the creature to have an explicit concept of complexity that’s different from the concept of complexity implicit in its prior.
Assuming that’s the case, its concept of complexity may still seem very strange and convoluted to us or to some other creature, but to that creature it will appear to be perfectly natural, since there’s nothing else to judge it by.
Could we just be such a creature? Intuitively, the answer seems to be no. Our concept of complexity seems to be natural in some absolute, objective sense, and not just relative to itself. But why is it so hard to pin that down?
Circular/anthropic arguments are seductive, but invariably turn out to be flawed because they predict less order and more narrowly-averted chaotic doom than we actually observe. Compared to our value system, which is genuinely a product of many evolutionary accidents, our concept of complexity is too simple because it can be captured (albeit imperfectly) by Turing machines. In other words, a creature using a randomly-evolved concept of “foobity” wouldn’t be able to approach it with simple math, as we do.
I think i’ts a mistake to reach for the anthropic hammer whenever you see a confusing problem. The most extreme example was when Nesov produced a “proof” that particle physics is a Darwinian construct, after which (thankfully) the absurdity heuristic finally kicked in.
But what do you mean by “simple” math? Simple according to what, if not “foobity”?
ETA: I looked at Nesov’s comment about particle physics, and didn’t understand it. Can you explain?
Re ETA: Nesov said that particle physics are this way because you only care about the worlds where they are this way. Just like your explanation of probabilities. :-)
A correction (though I mixed that up in comments too): what we anticipate is not necessarily linked to what we care about. Particle physics is this way because we anticipate worlds in which it’s this way, but we may well care about other worlds in which it isn’t.
Anticipation is about what we can control (as evolution saw the possibility, based on the past in the same world), not what we want to happen. Since evolution is causal, we don’t anticipate acausal control, but we can care about acausal control.
The useful conclusion seems to be that the concept of anticipation (and hence, reality/particle physics) is not fundamental in the decision theory sense, it’s more like the concept of hunger: something we can feel, can have accurate theories about, but doesn’t answer questions about the nature of goodness.
Don’t know about you, but I anticipate acausal control, to a degree. I have a draft post titled “Taking UDT Seriously” featuring such shining examples as: if a bully attacks you, you should try to do maximum damage while disregarding any harm to yourself, because it’s good for you to be predicted as such a person. UDT is seriously scary when applied to daily life, even without superintelligences.
I don’t think UDT (or a variant of UDT that applies to humans that nobody has really formulated yet, because the original UDT assumed that one has access to one’s source code) implies this, because the difference between P(bully predicts me as causing a lot of damaged | I try to cause maximum damage) and P(bully predicts me as causing a lot of damaged | I don’t try to cause maximum damage) seems quite small (because the bully can’t see or predict my source code and also can’t do a very good job of simulating or predicting my decisions), while the negative consequences of trying to cause maximum damage seems quite high if the bully fails to be preemptively dissuaded (e.g., being arrested or sued or disciplined or retaliated against).
(Not sure if you still endorse this comment, 9 years later, but I sometimes see what I consider to be overly enthusiastic applications of UDT, and as the person most associated with UDT I feel an obligation to push against that.)
Can you post this in the discussion area?
You seem to be mixing up ambient control within a single possible world with assignment of probability measure to the set of possible worlds (which anticipation is all about). You control the bully by being expected (credibly threatening) to retaliate within a single possible world. Acausal control is about controlling one possible world from another, while ambient (logical) control is about deciding the way your possible world will turn out (what you discussed in the recent posts).
More generally, logical control can be used to determine an arbitrary concept, including that of utility of all possible worlds considered together, or of all mathematical structures. Acausal control is just a specific way in which logical control can happen.
Yep. I can’t seem to memorize the correct use of our new terminology (acausal/ambient/logical/etc), so I just use “acausal” as an informal umbrella term for all kinds of winning behavior that don’t seem to be recommended by CDT from the agent’s narrow point of view. Like one-boxing in Newcomb’s Problem, or being ready to fight in order to release yet-undiscovered pheromones or something.
“Correct” is too strong a descriptor, it’s mostly just me pushing standardization of terminology, based on how it seems to have been used in the past.
Sorry, I misparsed your comment and gave a wrong answer, which I then deleted.
Your original comment was trivially correct, and my reply missed the point. We can never justify our concept of complexity by thinking like that—linguistically—because this would be like trying to justify our prior with our prior, “a priori”. If my prior is based on complexity and Bob’s prior is based on foobity (religion, or whatever), we will find each other’s priors weird. So if you ask whether all imaginable creatures have to use our concept of complexity, the easy answer is no. Instead we look at the outside world and note that our brand of razor seems to work. When it doesn’t (religion, or whatever), we update it. Is there any other aspect to your question that I missed?
Let’s call our brand of razor together with the algorithm we use to update it (using what we see from the outside world) our “meta-razor”. Now is this “meta-razor” just a kind of “foobity”, i.e., an arbitrary notion that we just happen to have, or is there something objective about it?
I spent some time thinking about your question and cannot give an answer until I understand better what you mean by objective vs arbitrary.
The concept of complexity looks objective enough in the mathematical sense. Then, if I understand you correctly, you take a step back and say that mathematics itself (including logic, I presume?) is a random concept, so other beings could have wildly different “foomatics” that they find completely clear and intuitive. With the standards thus raised, what kind of argument could ever show you that something is “objective”? This isn’t even the problem of induction, this is… I’m at a loss for words. Why do you even bother with Tegmark’s multiverse then? Why not say instead that “existence” is a random insular human concept, and our crystalloid friends could have a completely different concept of “fooistence”? Where’s the ground floor?
Here’s a question to condense the issue somewhat. What do you think about Bayesian updating? Is it “objective” enough?
Perhaps asking that question wasn’t the best way to make my point. Let me try to be more explicit. Intuitively, “complexity” seems to be an absolute, objective concept. But all of the formalizations we have of it so far contain a relativized core. In Bayesian updating, it’s the prior. In Kolmogorov complexity, it’s the universal Turing machine. If we use “simple math”, it would be the language we use to talk about math.
This failure to pin down an objective notion of complexity causes me to question the intuition that complexity is objective. I’d probably change my mind if someone came up with a “reasonable” formalization that’s not “relative to something.”
Implementable on a machine during my lifetime. That’s got to be an objective property, at least I’m wondering how you will spin it to sound subjective :-)
Edit: whoops, sorry, this is wrong. Don’t bother answering.
Short computer programs, compared to the ones that would encode our concepts of “beauty” or “fairness”, say.
The fact that the proof uses only very weak properties of the notion of “complexity” does not show that the proof is invalid. In fact, it suggests (not with certainty) the opposite: that an even stronger result could be proven.
Thanks and apologies, I was being careless when I said the proof “couldn’t be right”. It may be formally right, but fail to explain why we should use Occam’s Razor instead of the Flubby Razor.
Yes a statement like “on average, X holds” needs to be quantitative in order to be useful in practice. Still, this is a valuable argument, especially as far as it clarifies what needs to be done to reach a quantitative refinement.
The proof does allow for different Razors depending on your specific definition of complexity. It’s true that this makes the statement fairly weak. But it is also in a way a virtue, because it corresponds with our actual use of the Razor. There are in fact different definitions of complexity and we use different ones in different contexts; we learn from experience which contexts require which Razors. For example, if you want to describe the path of a ball moving through the air, you expect to be able to describe it with some fairly simple mathematics, and you think this more likely than complicated mathematical descriptions.
On the other hand, if you see a precise human-shaped footprint in the mud in your yard, saying “a human did it” is very complicated mathematically, since the mathematical description would include the description of a human being. So in this way, “Wind and weather did it” should be simpler, in the mathematical way. Nonetheless, you use a different Razor and say that with this Razor, it is simpler to say that a human being did it.
The last paragraph feels wrong to me. This is the explanation for that feeling that came to mind first, but I’m not positive it’s my true objection:
If your only observation were the footprint, sure—but your observations also include that you live on a planet with 7 billion people who leave footprints like that—so despite the comparitive algorithmic simplicity of the weather model, it doesn’t match the observations as well.
Yes, it is likely that a stronger result is possible. However, it is difficult to make it stronger while satisfying my conditions (true in all possible worlds; true according to every logically consistent assignment of priors).
Interestingly, flubbity will correlate with complexity, regardless of how you define it. This is for pretty much the same reason as the inverse correlation of complexity and probability.
I don’t understand your objection. Yes, it will apply to other attributes as well. I don’t see how that prevents it from applying to complexity.
If the objection is that I didn’t describe complexity in detail, this doesn’t matter, precisely because the proof will still go through regardless of how much or little is added.
It does prevent the proof from applying to complexity. Which hypothesis would you choose: the simpler one, or the less flubby one?
On average you should choose the simpler of two hypotheses compared according to simplicity, and again, on average you should choose the less flubby of two hypotheses compared according to flubbiness.
Or to put this in more real terms: the Razor will be true if it is understood to mean that you should choose the hypothesis that can be described by the shorter program, and it will also be true that you should choose the hypothesis that can be described by a shorter English description.
Yes, these can on particular occasions be opposed to one another, and you would have to ask which rule is better: choose the shorter English description, or choose the shorter program? My proof does not answer this question, but it doesn’t have to, because both rules measure some kind of complexity, and the Razor is true whether it is taken in one way or the other.
Flubbity may not have much to do with complexity. In fact it can be opposed to complexity, except in the limit for extremely complex/flubby hypotheses. For example, you may say that flubbity=1000000-complexity for complexity<1000000, and flubbity=complexity elsewhere. Your proof will go through just fine, but in our world (which probably doesn’t need such huge hypotheses) it will lead to the opposite of Occam’s Razor. You don’t always have the luxury of letting your parameter go to infinity.
By Occam’s razor?
The proof follows on average. Naturally you can construct artificial examples that make fun of it, but there can be no proof of the Razor which is not based on averages, since in fact, it happens on various occasions that the more complex hypothesis is more correct than the simpler hypothesis.
I don’t object to the formal correctness of the proof, but the statement it proves is way too weak. Ideally we’d want something that works for complexity but not flubbity. For any Occamian prior you care to build, I can take the first few hypotheses that comprise 99% of its weight, build a new prior that assigns them a weight of 1e-20 combined, and claim it’s just as good as yours by Occamian lights.
If we removed the words “on average” from the formulation of your theorem, we’d have a stronger and more useful statement. Kelly’s work shows an approach to proving it not just “on average”, but for all possible hypothesis lengths.
ETA: I apologize for not objecting to the formal side of things. I just read the proof once again and failed to understand what it even means by “on average”.
I started reading some of Kelly’s work, and it isn’t trying to prove that the less complex hypothesis is more likely to be true, but that by starting from it you converge on the truth more quickly. I’m sure this is right but it isn’t what I was looking for.
Yes, the statement is weak. But this is partly because I wanted a proof which would be 1) valid in all possible worlds ; 2) valid according to every logically consistent assignment of priors. It may be that even with these conditions, a stronger proof is possible. But I’m skeptical that a much stronger proof is possible, because it seems to be logically consistent for someone to say that he assigns a probability of 99% to a hypothesis that has a complexity of 1,000,000, and distributes the remaining 1% among the remaining hypotheses.
This is also why I said “on average.” I couldn’t remove the words “on average” and assert that a more complex statement is always less probable without imposing a condition on the choice of prior which does not seem to be logically necessary. The meaning of “on average” in the statement of the Razor is that in the limit, as the complexity tends to infinity, the probability necessarily tends to zero; given any probability x, say 0.000001 or whatever , there will be some complexity value z such that all statements equal or greater than that complexity value z have a probability less than x.
I will read the article you linked to.
Why do you want the theorem to hold for every logically consistent prior? This looks backwards. Occamian reasoning should show why some prior distributions work better than others, not say they’re all equally good. For example, the Solomonoff prior is one possible formalization of Occam’s Razor.
Because for every logically consistent prior, there should be a logically possible world where that prior works well. If there isn’t, and you can prove this to me, then I would exclude priors that don’t work well in any possible world.
I want it to apply to every possible world because if we understand the Razor in such a way that it doesn’t apply in every possible world, then the fact that Razor works well is a contingent fact. If this is the case there can’t be any conclusive proof of it, nor does it seem that there can be any ultimate reason why the Razor works well except “we happen to be in one of the possible worlds where it works well.” Yes, there could be many interpretations which are more practical in our actual world, but I was more interested in an interpretation which is necessary in principle.
This is even more backwards. There are logically possible worlds where an overseer god punishes everyone who uses Bayesian updating. Does this mean we should stop doing science? Looking for “non-contingent” facts and “ultimate” reasons strikes me as a very unfruitful area of research.
Different people have different interests.
How do you know when that happens?
My point is that if someone has a higher greater for the more complex hypothesis which turns out to be correct, you cannot object to his prior, saying “How did you know that you should use a higher prior,” since people do not justify their priors. Otherwise they wouldn’t be priors.
A major use (if not the whole point,) of occam’s razor is to have a rational basis for priors.
If people don’t have to justify their priors, then why have a process for generating them at all?
If I create an encoding with ‘God’ as a low complexity explanation, would you say I am being rational?
But the point of my question above was that you find out that the more complex hypothesis is correct when you get evidence for it. Juggling your priors is not the way to do it. (in fact it probably invites accidentally counting evidence twice.
This is spot-on. Furthermore, such a lax definition indicates that certain hypotheses will have “probability” zero. If our language is English, the explanation, “PUJF;FDAS!;FDS?” could be assigned probability zero. While this does not guarantee that the set of possible explanations is finite, neither does the author prove that the set of nonzero possibilities is infinite, and if it is not this proof is largely useless.
Also, interestingly, the rules do not address the, “A wizard did it!” problem with complexity, though that is likely far beyond the attempted scope.