As Unknown pointed out, using Kolmogorov complexity is a better bet if winning means finding the truth quickly, instead of just eventually.
The rules of this game, plus the assumption that Nature is maximally adversarial, seems specifically designed to so that this particular version of Ockham would be optimal. It doesn’t really seem to provide much insight into what one should do in more general/typical settings.
The question that Kelly is trying to answer is “Why, or in what sense, does Occam’s razor work?”. Yes, the answer is “It works in that it is worst-case-efficient.” He doesn’t assume a simplicity-biased prior (which some might characterize as a circular justification of Occam’s razor).
I worry that the particular example that I’m presenting here (where the Occam strategy steps in a vaguely linear fashion) is coloring your thinking about Kelly’s work generally, which has little or nothing to do with stepping in a vaguely linear fashion, and everything to do with believing the simplest theory compatible with the evidence. (Which, based on your other writings, I think you would endorse.)
Please investigate further, maybe read one of Kelly’s papers, I would hate for my poor writing skills to mislead you.
He doesn’t assume a simplicity-biased prior (which some might characterize as a circular justification of Occam’s razor).
Is the particles example one of Kelly’s own examples, or something you made up to explain his idea? Because in that example at least, we seem to be assuming a game with very specific rules, showing that a particular strategy is optimal for that game, and then calling that strategy “Occam”. Compare this with Solomonoff induction, where we only have to assume that the input is computable and can show that Bayesian updating with a prior based on Kolmogorov complexity is optimal in a certain sense.
everything to do with believing the simplest theory compatible with the evidence. (Which, based on your other writings, I think you would endorse.)
I do endorse that, but there are various aspects of the idea that I’m still confused about, and I’m not seeing how Kelly’s work helps to dissolve those confusions.
I think I read some of Kelly’s own writings the last time cousin_it pointed them out, but again, didn’t really “get it” (in the sense of seeing why it’s important/interesting/insightful). I’m hoping that I’m just missing something, and you, or someone else, can show me what it is. Or perhaps just point to a specific paper that explains Kelly’s insights most clearly?
The particles example is derived from one of Kelly’s examples (marbles in a box).
Kolmogorov complexity is unavailable—only approximations it are available, and they’re not unique. There are multiple reasonable grammars that we can use to do MDL, and no clear justification for why we should use one rather than another. For an extreme example, imagine measuring simplicity by “size of the patch to the Catholic Church’s dogma.”
Kelly’s notion that Occam means worst-case-efficient-in-mind-changes allows us to avoid the prior “dropping like manna from heaven”.
In the first paper you cite, there is a particles example that is essentially the same example as yours, and Kelly does use it as his main example. The only difference is instead of counting the number of fundamental particles in physics, his example uses a device that we know will emit a finite number of particles.
On page 33, Kelly writes about how his idea would handle a modification of the basic example:
For example, suppose you have to report not only how
many particles will appear but when each one will appear [...] The only sensible resolution of this Babel of alternative coarsenings is for Ockham to steer clear of it altogether. And that’s just what the proposed theory says. [...] Ockham
is both mute and toothless (perhaps mute because toothless) in this problem.
Again, that is the correct answer.
So suppose this device has been emitting one particle every 10 seconds for the last million seconds. According to Kelly’s version of Ockham’s Razor (perhaps we should call it Kelly’s Razor instead?), we can’t predict that the next particle will come 10 seconds later. What use is Kelly’s idea, if I want to have a notion of complexity that can help us (or an AI) make decisions in general, instead of just playing some specific games for which it happens to apply?
You read the paper! Thanks for pointing out that we know somehow that only a finite number of particles will ever be found.
To explain the “oneicle” problem: It seems like how a scenario is coded into a game matters. For example, if you viewed the timed particles game as having two possible worlds “The device will always emit a particle every 10 seconds.” and “The device will sometimes emit a particle every 10 seconds.”, then the first world cannot pretend to be the second world, but the second world can camouflage itself as the first world for a time, and so (Kelly’s version of) Occam’s razor says the first is simpler—we get the intuitively correct answer.
The alternative coding is somewhat analogous to the color “grue” (which is green up until some date, and blue thereafter). You recode the problem to talk about “oneicles”, a concept that refers to non-particles up to time 1, and particles thereafter. If you allow this sort of recoding, then you would also allow “twoticles”, and the infinite hierarchy of symmetric re-codings causes a problem. I tend to think this is a technical problem that is unlikely to expand into the philosophy part of the theory, but I’m kindof an idiot, and I may be missing something—certainly we would like to avoid coding-dependence.
That’s a problem (the first problem Kelly mentioned in that paper), but do you really require a theory to have no problems remaining in order for it to be counted as insightful? No one else addresses the question “Where does the prior come from?”.
do you really require a theory to have no problems remaining in order for it to be counted as insightful?
It would be one thing if Kelly said that the theory currently can’t predict that another particle will come in 10 seconds, but he hopes to eventually extend it so that it can make predictions like that. But instead he says that Ockham is mute on the question, and that’s the right answer.
No one else addresses the question “Where does the prior come from?”.
Neither does Kelly. I don’t see how we can go from his idea of Ockham to a Bayesian prior, or how to use it directly in decision making. Kelly’s position above suggests that he doesn’t consider this to be the problem that he’s trying to solve. (And I don’t see what is so interesting about the problem that he is trying to solve.)
Okay, I think we’ve reached a point of reflective disagreement.
I agree with you that Kelly was wrong to be enamored of his formalization’s output on the timed particles example; it’s either a regrettable flaw that must be lived with, or a regrettable flaw that we should try to fix, and I don’t understand enough of the topological math to tell which.
However, the unjustified Occam prior in the standard Bayesian account of science is also a regrettable flaw—and Kelly has demonstrated that it’s probably fixable. I find that very intriguing, and am willing to put some time into understanding Kelly’s approach—even if it dissolves something that I previously cherished (such as MDL-based Occam priors).
Reasonable people can reasonably disagree regarding which research avenues are likely to be valuable.
I am very late to the discussion. I have not read Kelley’s papers in detail, so pardon me if my question betrays a fundamental misunderstanding of what you wrote: How can “(Kelly’s version of) Occam’s razor says the first [world] is simpler” and give us “the intuitively correct answer” if an infinite number of particles will be emitted in the first world, even though Kelley has already specified that the device will only emit a finite number of particles?
As Unknown pointed out, using Kolmogorov complexity is a better bet if winning means finding the truth quickly, instead of just eventually.
The rules of this game, plus the assumption that Nature is maximally adversarial, seems specifically designed to so that this particular version of Ockham would be optimal. It doesn’t really seem to provide much insight into what one should do in more general/typical settings.
The question that Kelly is trying to answer is “Why, or in what sense, does Occam’s razor work?”. Yes, the answer is “It works in that it is worst-case-efficient.” He doesn’t assume a simplicity-biased prior (which some might characterize as a circular justification of Occam’s razor).
I worry that the particular example that I’m presenting here (where the Occam strategy steps in a vaguely linear fashion) is coloring your thinking about Kelly’s work generally, which has little or nothing to do with stepping in a vaguely linear fashion, and everything to do with believing the simplest theory compatible with the evidence. (Which, based on your other writings, I think you would endorse.)
Please investigate further, maybe read one of Kelly’s papers, I would hate for my poor writing skills to mislead you.
Is the particles example one of Kelly’s own examples, or something you made up to explain his idea? Because in that example at least, we seem to be assuming a game with very specific rules, showing that a particular strategy is optimal for that game, and then calling that strategy “Occam”. Compare this with Solomonoff induction, where we only have to assume that the input is computable and can show that Bayesian updating with a prior based on Kolmogorov complexity is optimal in a certain sense.
I do endorse that, but there are various aspects of the idea that I’m still confused about, and I’m not seeing how Kelly’s work helps to dissolve those confusions.
I think I read some of Kelly’s own writings the last time cousin_it pointed them out, but again, didn’t really “get it” (in the sense of seeing why it’s important/interesting/insightful). I’m hoping that I’m just missing something, and you, or someone else, can show me what it is. Or perhaps just point to a specific paper that explains Kelly’s insights most clearly?
The particles example is derived from one of Kelly’s examples (marbles in a box).
Kolmogorov complexity is unavailable—only approximations it are available, and they’re not unique. There are multiple reasonable grammars that we can use to do MDL, and no clear justification for why we should use one rather than another. For an extreme example, imagine measuring simplicity by “size of the patch to the Catholic Church’s dogma.”
Kelly’s notion that Occam means worst-case-efficient-in-mind-changes allows us to avoid the prior “dropping like manna from heaven”.
I recommend this paper:
http://www.hss.cmu.edu/philosophy/kelly/papers/bonn5.pdf
but if that one doesn’t do it for you, there are others with more cartoons and less words:
http://www.fitelson.org/few/few_05/kelly_2.pdf
Or ones with more words and less cartoons:
http://www.hss.cmu.edu/philosophy/kelly/papers/Ch4-Glymour%20&%20Kelly-final.pdf
In the first paper you cite, there is a particles example that is essentially the same example as yours, and Kelly does use it as his main example. The only difference is instead of counting the number of fundamental particles in physics, his example uses a device that we know will emit a finite number of particles.
On page 33, Kelly writes about how his idea would handle a modification of the basic example:
So suppose this device has been emitting one particle every 10 seconds for the last million seconds. According to Kelly’s version of Ockham’s Razor (perhaps we should call it Kelly’s Razor instead?), we can’t predict that the next particle will come 10 seconds later. What use is Kelly’s idea, if I want to have a notion of complexity that can help us (or an AI) make decisions in general, instead of just playing some specific games for which it happens to apply?
You read the paper! Thanks for pointing out that we know somehow that only a finite number of particles will ever be found.
To explain the “oneicle” problem: It seems like how a scenario is coded into a game matters. For example, if you viewed the timed particles game as having two possible worlds “The device will always emit a particle every 10 seconds.” and “The device will sometimes emit a particle every 10 seconds.”, then the first world cannot pretend to be the second world, but the second world can camouflage itself as the first world for a time, and so (Kelly’s version of) Occam’s razor says the first is simpler—we get the intuitively correct answer.
The alternative coding is somewhat analogous to the color “grue” (which is green up until some date, and blue thereafter). You recode the problem to talk about “oneicles”, a concept that refers to non-particles up to time 1, and particles thereafter. If you allow this sort of recoding, then you would also allow “twoticles”, and the infinite hierarchy of symmetric re-codings causes a problem. I tend to think this is a technical problem that is unlikely to expand into the philosophy part of the theory, but I’m kindof an idiot, and I may be missing something—certainly we would like to avoid coding-dependence.
That’s a problem (the first problem Kelly mentioned in that paper), but do you really require a theory to have no problems remaining in order for it to be counted as insightful? No one else addresses the question “Where does the prior come from?”.
It would be one thing if Kelly said that the theory currently can’t predict that another particle will come in 10 seconds, but he hopes to eventually extend it so that it can make predictions like that. But instead he says that Ockham is mute on the question, and that’s the right answer.
Neither does Kelly. I don’t see how we can go from his idea of Ockham to a Bayesian prior, or how to use it directly in decision making. Kelly’s position above suggests that he doesn’t consider this to be the problem that he’s trying to solve. (And I don’t see what is so interesting about the problem that he is trying to solve.)
Okay, I think we’ve reached a point of reflective disagreement.
I agree with you that Kelly was wrong to be enamored of his formalization’s output on the timed particles example; it’s either a regrettable flaw that must be lived with, or a regrettable flaw that we should try to fix, and I don’t understand enough of the topological math to tell which.
However, the unjustified Occam prior in the standard Bayesian account of science is also a regrettable flaw—and Kelly has demonstrated that it’s probably fixable. I find that very intriguing, and am willing to put some time into understanding Kelly’s approach—even if it dissolves something that I previously cherished (such as MDL-based Occam priors).
Reasonable people can reasonably disagree regarding which research avenues are likely to be valuable.
I am very late to the discussion. I have not read Kelley’s papers in detail, so pardon me if my question betrays a fundamental misunderstanding of what you wrote: How can “(Kelly’s version of) Occam’s razor says the first [world] is simpler” and give us “the intuitively correct answer” if an infinite number of particles will be emitted in the first world, even though Kelley has already specified that the device will only emit a finite number of particles?
The statements, though contradictory, refer to two different thought experiments.
The two comments, though contradictory, refer to two different thought experiments.
I see. Thanks for the explanation.