Probability “models” frequency in the sense that sometimes frequency data dominates all of our other knowledge about some phenomenon.
No, probability models frequency in the sense that there is an interpretation of komologorov which only mentions terms from the part of our language used to talk about frequency, and all komologorov theorems come out as true statements about frequency under this interpretation.
I mean, literally, Bayes is an arithmetic of odds and fractions, of course it models frequency. At least as well as fractions and odds do. Probability is a frequency as often as it is a fraction or an odds.
So you are saying that as long as frequentists understand that Bayesian methods are theoretically ideal and cannot be improved upon, whereas frequentist methods may be useful approximations, they shouldn’t run into any real life mistakes.
This is nearly true, were it not for the fact that frequentists don’t actually believe this.
They don’t, but could and should.
If someone manages simultaneously to believe that frequentist philosophy (probability ≡ frequency) is sound, yet frequentist methods are fallible ad-hoc methods and Bayesian methods provide the best inferences possible given our state of knowledge, then he is performing quite a feat.
I agree, this is why instead of saying that probability is identical to frequency, or that it is frequency, we should say that it models frequency.
What is this comment supposed to add? Is it an ad hominem, or are you asking for clarification? If you don’t understand that comment perhaps you should try rereading my original post, I have updated it a bit since you first commented, perhaps it is clearer.
(edit) clarification:
The reason that probabilities model frequency is not because our data about some phenomena are dominated by facts of frequency. If you take 10 chips, 6 of them red, 4 of them blue, 5 red ones and 1 blue one on the table, and the rest not on the table, you’ll find that bayes can be used to talk about the frequencies of these predicates in the population. You only need to start with theorems that when interpreted produce the assumptions I just provided, e.g., P(red and on the table) = 1⁄2, P(~red and on the table) = 1⁄10, P(red and ~on the table) = 2⁄5. From those basic statements we can infer using bayes all the following results: P(red|on the table) = 5⁄6, P(~red|on the table) = 1⁄6, P( (red and on the table) or blue) = 9⁄10, P(red) = P(red|on the table) P(on the table) + P(red|~on the table) P(~on the table) = 6⁄10, etc. These are all facts about the FREQUENCY distributions of these chips’ predicates, which can be reached using bayes, and the assumptions above. We can interpret P(red) as the frequency of red chips out of all the chips, and P(red|on the table) as the frequency of red chips out of chips on the table. You’ll find that anything you proof about these frequencies using bayesian inference will be true claims about the frequencies of these predicates within the chips. Hence, bayes models frequency. This is all I meant by bayes models frequency. You’ll also find that it works just as well with volume or area. (I am sorry I wasn’t that concrete to begin with.)
In the same exact way, you can interpret probability theorems as talking about degrees of belief, and if you ask a bayesian, all those interpreted theorems will come out as true statements about rational degree of belief. In this way bayes models rational belief. You can also interpret probability theory as talking about boston’s night life, but not everyone of those interpreted theorems will be true, so probabiliity theory does not model boston’s night life under that interpretation. To model something, means to produce only true statements under a given interpretation about that something.
Frequentists may not treat their tool box as a set of mostly unrelated approximations to perfect learning, or treat bayes as the optimal laws of inference, but they should as far as I can tell. And if they did, they would not cease to be frequentists, they would still use the same methods, use “probability” the same way, and still focus on long run frequency over evidential support. The only difference is that rather than saying probability is frequency and that probability is not subjective degree of belief, they would say that probability models both frequency and subjective degree of belief. Subjective bayesians should make a similar update, though I am sure they don’t swing the copula around as liberally as frequentists. This is what i meant when i said that frequentists could and should believe that frequentism is just a useful approximation, and that bayes is in some sense optimal. I was never really arguing about the practical advantages of bayesianism over frequntism, but about how they both seem to make a similar philosophical mistake in using identity or the copula when the relation of modeling is more applicable. A properly Hofstadterish formalism seems like the best way to deal with all of this comprehensively.
You understand what I was saying now? I really want to know. That you are confused by what seem to me to be my most basic claims, and that you are also as familiar with E. T. Jaynes as your comments suggests is worrying to me. Does this clarification make you less confused?
Fine, let’s make up a new frequentism, which is probably already in existence: finite frequentism. Bayes still models finite frequencies, like the example i gave of the chips.
When a normal frequentest would say “as the number of trials goes to infinity” the finite frequentest can say “on average” or “the expectation of”. Rather than saying, as the number of die rolls goes to infinity the fraction of sixes is 1⁄6, we can just say that as the number rises it stabilizes around and gets closer to 1⁄6. That is a fact which is finitely verifiable. If we saw that the more die rolls we added to the average, the closer the fraction of sixes approached 1⁄2, and the closer it hovered around 1⁄2, the frequentest claim would be falsified.
There may be no infinite populations. But the frequentist can still make due with finite frequencies and expected frequencies, and i am not sure what he would loose. There are certainly finite frequencies in the world, and average frequencies are at least empirically testable. What can the frequentist do with infinite populations or trials, that he/she can’t do with expected/average frequencies.
Also, are you a finitist when it comes to calculus? Because the differential calculus requires much more commitment to the idea of a limit, infinity, and the infinitesimal, than frequentists require, if frequentests require these concepts at all. Would you find a finitist interpretation of the calculus to be more philosophically sound than the classical approach?
Except one makes it seem like the stuff does exist and the other makes it seem like it doesn’t. If we interpret the law of large numbers as saying that after an infinite number of trials, the average value of that sequence of results will equal the expected value of the random variable, then any finite amount of evidence is not enough to be evidence of this interpretation, let alone to verify it. But if we interpret the law as saying that the more trials we add, the more closely the average result should hover around the expected value of the variable. That interpretation can be falsified and evidenced empirically using only finite observations.
Ok, I am a Bayesian, i.e., I use bayesianism over frequentism, and find frequentest methods rather silly. And I am at least what I would call a finite frequentist, i.e., I think komolgorov models finite frequency.
I am not here to say that Bayesianism is on equal ground with Frequentism, at all. If Bayes’s interpreted sentences can be empirically verified, and freqeuntist interpretations cannot be empirically verified, than this is to bayesianism’s favor. It means it is the more useful theory. But it is not grounds to use “is” where we should use “models” instead. It is not because bayesains need to be put on equal footing with frequentists that I propose this terminology in place of the copula; it is because rationalists should be clear, specially in philosophy. So just to be clear, we should use “model” instead of “is” because it is what is really going on; the concepts of Hofstadterish formalism and model theory, are the best way to understand how probability theory ends up telling us how to distribute beliefs. The relation between subjective degree of belief, and probability theory, is clearly not identity, or the copula.
Subjective degrees of belief are a part of your cognition.
Probability theory is a repeatable process consisting in shuffling squiggles on paper, or some other medium.
These are not identical.
Q.e.d.
We might call this the “paper projection fallacy”. Where you project some pattern in squiggles on a piece of paper into your mind. Analogous to the “mind projection fallacy”.
Bayesian probability theory is the mathematical formulation representing ideal reasoning under uncertainty
The squiggles on the paper are our representation of this probability—they are “probability”, not probability, if you like.
no, probability, or I assume you really mean rationality, is the void
Bayes is just playing with squiggles on paper. If when you interpreted bayes, you found some claim, which seemed to not work, you would have to abandon bayes, or be irrational.
The squiggles on the paper are our representation of this probability.
What probability where? If you start by saying degree of belief is probability, and then show that degreeo f belief is probability, I am not impressed. You can call them “the representation” instead of calling them “the theory” if you want. And you can use “is” instead of model if you want. And you can even use “probability” instead of “degree of belief”, though I suspect that may all get rather confusing quickly. But do realize that every reason you give for saying that probability is degree of belief, a frequentest can give for saying that probability is frequency.
“Probability” is a really stupid noun, kind of like “red-hood”, or “emergence”. Notice how in the actual theory, we only ever talk about the probability of something. “Probability” is a function, not an object. Ask yourself: “what IS probability?” really probe, and you’ll find that that is a stupid question. The right question would have been, “what does probability return given an argument?” The answer is that it might return the rational degree of belief of a proposition, the frequency of a predicate out of a finite population, the frequency of a value out of an infinite amount of trials, the volume of a space, the area of a shape, or even the length of a line. All of these are consistent with the komologorov axioms.
This assumes an “expected value” which could only be known by some other means, i.e. accepting the Bayesian notion of probability as subjective degrees of belief, or supposing an infinite number of trials. Such a definition of frequentism begs the question.
Well it is actually in the bartender’s premise that the coin is biased, so they both know that whatever heads/trials hovers around as trials rises, it is not 1⁄2.
But assuming they didn’t have that premise, what could the frequentist do, without requiring non-empirically verifiable claims as assumptions?
Only thing i can think of: He/she could resort to ranges. Never actually defining the probability of heads, just determining the probability with which the actual probability i.e. frequency of heads, is within a given range. There is some ideal actual frequency, which would be the outcome given infinitely many trials, but you can only find a range within which it is, and it would require infinite amounts of evidence to constrain heads/trials to a point; and we don’t have that kind of time. Bayes can be extended to ranges of probability trivially. THis would make it so that finite observables act as evidence for some hypotheses which include the term “infinity”. But it wouldn’t justify the whole of frequentest methodology.
But again, even if the frequentest interpretation fails in ways which the bayesian interpretation does not, this is not evidence of probability being degree of belief. It is evidence of probability modeling degree of belief, and of Bayesianism having sounder ontological commitments than frequentism. This would not surprise me.
Infinite frequency is not real. But our intuitions about it are real. Komolgorov may then be said to model actual finite frequencies, and our intuitions about infinite frequencies which are finitely and axiomatically describable. Let us not forget that there are not circles or squares anywhere either, but we should still hold that you can’t square the circle. Not all models have to be out there, some may be in here Frequentism requires infinite frequencies for its interpretation to be true, which don’t exist. The subjective bayes interpretation of bayes does not require anything that really doesn’t exist (though degrees of belief are plenty mysterious). This is a good reason to be a subjective Bayesian, and not a frequentest, which I was not aware of consciously, but it is not a good reason to stop being a formalist.
Who cares if frequentists, or non-LW bayesians, use the copula like a bunch of sillies, even after G.E.B. is published. We LWers, should use “identity” if we are claiming identity, and “modeling” if we are claiming a model. But realistically, the claim that “Probability theory models rational belief systems.” seems much more defensible, concrete, and useful, than the claim that “Probability is degree of belief.”
No, probability models frequency in the sense that there is an interpretation of komologorov which only mentions terms from the part of our language used to talk about frequency, and all komologorov theorems come out as true statements about frequency under this interpretation.
I mean, literally, Bayes is an arithmetic of odds and fractions, of course it models frequency. At least as well as fractions and odds do. Probability is a frequency as often as it is a fraction or an odds.
They don’t, but could and should.
I agree, this is why instead of saying that probability is identical to frequency, or that it is frequency, we should say that it models frequency.
.
What is this comment supposed to add? Is it an ad hominem, or are you asking for clarification? If you don’t understand that comment perhaps you should try rereading my original post, I have updated it a bit since you first commented, perhaps it is clearer.
(edit) clarification:
The reason that probabilities model frequency is not because our data about some phenomena are dominated by facts of frequency. If you take 10 chips, 6 of them red, 4 of them blue, 5 red ones and 1 blue one on the table, and the rest not on the table, you’ll find that bayes can be used to talk about the frequencies of these predicates in the population. You only need to start with theorems that when interpreted produce the assumptions I just provided, e.g., P(red and on the table) = 1⁄2, P(~red and on the table) = 1⁄10, P(red and ~on the table) = 2⁄5. From those basic statements we can infer using bayes all the following results: P(red|on the table) = 5⁄6, P(~red|on the table) = 1⁄6, P( (red and on the table) or blue) = 9⁄10, P(red) = P(red|on the table) P(on the table) + P(red|~on the table) P(~on the table) = 6⁄10, etc. These are all facts about the FREQUENCY distributions of these chips’ predicates, which can be reached using bayes, and the assumptions above. We can interpret P(red) as the frequency of red chips out of all the chips, and P(red|on the table) as the frequency of red chips out of chips on the table. You’ll find that anything you proof about these frequencies using bayesian inference will be true claims about the frequencies of these predicates within the chips. Hence, bayes models frequency. This is all I meant by bayes models frequency. You’ll also find that it works just as well with volume or area. (I am sorry I wasn’t that concrete to begin with.)
In the same exact way, you can interpret probability theorems as talking about degrees of belief, and if you ask a bayesian, all those interpreted theorems will come out as true statements about rational degree of belief. In this way bayes models rational belief. You can also interpret probability theory as talking about boston’s night life, but not everyone of those interpreted theorems will be true, so probabiliity theory does not model boston’s night life under that interpretation. To model something, means to produce only true statements under a given interpretation about that something.
Frequentists may not treat their tool box as a set of mostly unrelated approximations to perfect learning, or treat bayes as the optimal laws of inference, but they should as far as I can tell. And if they did, they would not cease to be frequentists, they would still use the same methods, use “probability” the same way, and still focus on long run frequency over evidential support. The only difference is that rather than saying probability is frequency and that probability is not subjective degree of belief, they would say that probability models both frequency and subjective degree of belief. Subjective bayesians should make a similar update, though I am sure they don’t swing the copula around as liberally as frequentists. This is what i meant when i said that frequentists could and should believe that frequentism is just a useful approximation, and that bayes is in some sense optimal. I was never really arguing about the practical advantages of bayesianism over frequntism, but about how they both seem to make a similar philosophical mistake in using identity or the copula when the relation of modeling is more applicable. A properly Hofstadterish formalism seems like the best way to deal with all of this comprehensively.
You understand what I was saying now? I really want to know. That you are confused by what seem to me to be my most basic claims, and that you are also as familiar with E. T. Jaynes as your comments suggests is worrying to me. Does this clarification make you less confused?
.
Fine, let’s make up a new frequentism, which is probably already in existence: finite frequentism. Bayes still models finite frequencies, like the example i gave of the chips.
When a normal frequentest would say “as the number of trials goes to infinity” the finite frequentest can say “on average” or “the expectation of”. Rather than saying, as the number of die rolls goes to infinity the fraction of sixes is 1⁄6, we can just say that as the number rises it stabilizes around and gets closer to 1⁄6. That is a fact which is finitely verifiable. If we saw that the more die rolls we added to the average, the closer the fraction of sixes approached 1⁄2, and the closer it hovered around 1⁄2, the frequentest claim would be falsified.
There may be no infinite populations. But the frequentist can still make due with finite frequencies and expected frequencies, and i am not sure what he would loose. There are certainly finite frequencies in the world, and average frequencies are at least empirically testable. What can the frequentist do with infinite populations or trials, that he/she can’t do with expected/average frequencies.
Also, are you a finitist when it comes to calculus? Because the differential calculus requires much more commitment to the idea of a limit, infinity, and the infinitesimal, than frequentists require, if frequentests require these concepts at all. Would you find a finitist interpretation of the calculus to be more philosophically sound than the classical approach?
potato,
I don’t think there’s much value in replying to Phlebas’ latest reply.
.
Except one makes it seem like the stuff does exist and the other makes it seem like it doesn’t. If we interpret the law of large numbers as saying that after an infinite number of trials, the average value of that sequence of results will equal the expected value of the random variable, then any finite amount of evidence is not enough to be evidence of this interpretation, let alone to verify it. But if we interpret the law as saying that the more trials we add, the more closely the average result should hover around the expected value of the variable. That interpretation can be falsified and evidenced empirically using only finite observations.
.
Ok, I am a Bayesian, i.e., I use bayesianism over frequentism, and find frequentest methods rather silly. And I am at least what I would call a finite frequentist, i.e., I think komolgorov models finite frequency.
I am not here to say that Bayesianism is on equal ground with Frequentism, at all. If Bayes’s interpreted sentences can be empirically verified, and freqeuntist interpretations cannot be empirically verified, than this is to bayesianism’s favor. It means it is the more useful theory. But it is not grounds to use “is” where we should use “models” instead. It is not because bayesains need to be put on equal footing with frequentists that I propose this terminology in place of the copula; it is because rationalists should be clear, specially in philosophy. So just to be clear, we should use “model” instead of “is” because it is what is really going on; the concepts of Hofstadterish formalism and model theory, are the best way to understand how probability theory ends up telling us how to distribute beliefs. The relation between subjective degree of belief, and probability theory, is clearly not identity, or the copula.
Subjective degrees of belief are a part of your cognition. Probability theory is a repeatable process consisting in shuffling squiggles on paper, or some other medium. These are not identical. Q.e.d.
We might call this the “paper projection fallacy”. Where you project some pattern in squiggles on a piece of paper into your mind. Analogous to the “mind projection fallacy”.
.
no, probability, or I assume you really mean rationality, is the void
Bayes is just playing with squiggles on paper. If when you interpreted bayes, you found some claim, which seemed to not work, you would have to abandon bayes, or be irrational.
What probability where? If you start by saying degree of belief is probability, and then show that degreeo f belief is probability, I am not impressed. You can call them “the representation” instead of calling them “the theory” if you want. And you can use “is” instead of model if you want. And you can even use “probability” instead of “degree of belief”, though I suspect that may all get rather confusing quickly. But do realize that every reason you give for saying that probability is degree of belief, a frequentest can give for saying that probability is frequency.
“Probability” is a really stupid noun, kind of like “red-hood”, or “emergence”. Notice how in the actual theory, we only ever talk about the probability of something. “Probability” is a function, not an object. Ask yourself: “what IS probability?” really probe, and you’ll find that that is a stupid question. The right question would have been, “what does probability return given an argument?” The answer is that it might return the rational degree of belief of a proposition, the frequency of a predicate out of a finite population, the frequency of a value out of an infinite amount of trials, the volume of a space, the area of a shape, or even the length of a line. All of these are consistent with the komologorov axioms.
Now for this part:
Well it is actually in the bartender’s premise that the coin is biased, so they both know that whatever heads/trials hovers around as trials rises, it is not 1⁄2.
But assuming they didn’t have that premise, what could the frequentist do, without requiring non-empirically verifiable claims as assumptions?
Only thing i can think of: He/she could resort to ranges. Never actually defining the probability of heads, just determining the probability with which the actual probability i.e. frequency of heads, is within a given range. There is some ideal actual frequency, which would be the outcome given infinitely many trials, but you can only find a range within which it is, and it would require infinite amounts of evidence to constrain heads/trials to a point; and we don’t have that kind of time. Bayes can be extended to ranges of probability trivially. THis would make it so that finite observables act as evidence for some hypotheses which include the term “infinity”. But it wouldn’t justify the whole of frequentest methodology.
But again, even if the frequentest interpretation fails in ways which the bayesian interpretation does not, this is not evidence of probability being degree of belief. It is evidence of probability modeling degree of belief, and of Bayesianism having sounder ontological commitments than frequentism. This would not surprise me.
Infinite frequency is not real. But our intuitions about it are real. Komolgorov may then be said to model actual finite frequencies, and our intuitions about infinite frequencies which are finitely and axiomatically describable. Let us not forget that there are not circles or squares anywhere either, but we should still hold that you can’t square the circle. Not all models have to be out there, some may be in here Frequentism requires infinite frequencies for its interpretation to be true, which don’t exist. The subjective bayes interpretation of bayes does not require anything that really doesn’t exist (though degrees of belief are plenty mysterious). This is a good reason to be a subjective Bayesian, and not a frequentest, which I was not aware of consciously, but it is not a good reason to stop being a formalist.
Who cares if frequentists, or non-LW bayesians, use the copula like a bunch of sillies, even after G.E.B. is published. We LWers, should use “identity” if we are claiming identity, and “modeling” if we are claiming a model. But realistically, the claim that “Probability theory models rational belief systems.” seems much more defensible, concrete, and useful, than the claim that “Probability is degree of belief.”