In other words, the OP has mixed up the quotation and the referent (or the representation and the referent).
It seems to me that I am the one proposing a sharp distinction between probability theory (the representation), and rational degree of belief (the referent). If you say that probability is degree of belief, you destroy all the distinction between the model and the modeled. If by “probability” you mean subjective degree of belief, I don’t really care what you call it. But know that “probability” has been used in ways which are not consistent with that synonymy claim. By the fact that we do not have 100% belief that bayes does model ideal inference with uncertainty, this means that bayesian probability is not identical to subjective belief given out knowledge. If X is identical to Y, then X is isomorphic-to/models Y. Because we can still conceive of bayes not perfectly modeling rationality, without implying a contradiction, this means that our current state of knowledge does not include that bayes is identical to subjective degree of belief.
We learn that something is probability by looking at probability theory, not by looking at subjective belief. If rational subjective belief turned out to not be modeled by probability theory, then we would say that subjective degree of belief was not like probability, not that probability theory does not define probability.
The first person to make bayes, may have been thinking about rationality when he/she first created the system, or he/she may have been thinking about spatial measurements, or he/she may have been thinking about finite frequencies, and he/she would have made the same formal system in every case. Their interpretations would have been different, but they would all be the one identical probability theory. Which one the actual creator was thinking of, is irrelevant. What spaces, beliefs, finite frequencies all have in common is that they are modeled by probability theory. To use “probability” to refer to one of these, over another, is a completely arbitrary choice (mind you I said finite frequency).
If we loose nothing by using “models” instead of “is”, why would we ever use “is”? “Is’ is a much stronger claim than “models”. And frankly, I know how to check whether or not a given argument is an animal, for instance; how do I check if a given argument is a probability? I see if it satisifies the probability axioms. Finite frequency, measure, and rational degree of belief all seem to follow the probability axioms and inferences under specific, though similar, interpretations of probability theory.
As far as I can understand you, you seem to think that because humans aren’t always rational in the sense described by Cox’s postulates, probability theory only “models” human reasoning under uncertainty. You also seem to think that probability theory is just “squiggles on paper”.
Only models? Just squiggles on paper?
You’ve misunderstood the article, I think. Probability theory (the Kolmogorov Axioms) does model correct degrees of belief and describes normatively what they should be. It also models “long-term frequencies” in the sense that the Kolmogorov Axioms also apply to such things.
None of this requires the word “probability” to refer to degrees of belief. You don’t even need a word at all to do the math and get the right answer. It’s convenient to use the word that way though, since we already have a word “frequency” that refers to the stupider idea.
(And also I suspect that most people learned the word at school mostly by being given examples of likely and unlikely things. For them, “probability” refers to the little progress bar in their mind that goes up for more likely things and down for less likely things [ie. degrees of belief]. And thus many frequentists may commit philosophical errors when they try to define it as frequencies then use the intuitive definition to draw a conclusion in the same argument. This alone is a good reason to use “probability” for beliefs and “frequencies” for, well, frequencies.)
Yes, we can use “probability is degree of belief” but we have to be very careful about this sort of word play, because what that really means is that “probability models degree of belief”.
Probability doesn’t come from attempting to model something out in the world. It comes from attempting to find a measure of degree of belief that’s consistent with certain desiderata, like “you shouldn’t believe both a thing and its opposite.” So the phrase “probability models degree of belief” is false.
You’re riht, I mean to say “probability theory models theoretically optimal degree of belief updates, gven other degrees of belief”. Or “probability theory models ideally rational degrees of belief.”
Right. So what the heck’s the point of the article?
Why not just continue to say that probability is degrees of belief, and is not frequency?
Because then you’ll keep arguing for decades about which one it really is, to absolutely no fruitful conclusion. Why not just keep saying that sound is air pressure and not auditory experience, or vice versa? When you do that, it makes it harder to see what is really going on. Call me conservative, but I think we should use as precise of a terminology as possible. Also, it seems to me that “probability is degree of belief” is an unverifiable claim, or I at least do not know what experiences I should test it with. But really, even in your own writing you don’t feel comfortable using the copula as the relation between probability and degree of belief without italicizing it, doesn’t that make you think that maybe there is a better word for the relation which you wouldn’t feel like you need to italicize? How about “models”? And really we shouldn’t be using probability as a noun, it’s a function not an object, but we can deal with that later.
Exactly what about my article suggests that we should change our terminology to legitimize frequentism? I am saying that frequentism and subjective bayesianism both fail the moment they use the copula with probability as the subject, that is a stupid thing to do in philosophy. It’s as bad as hegel. “Probability” is not a noun, it is a function, it is syncategorematic like “the”, “or”, “sake”, etc. it is not categorematic; “probability” does not have a physical extension. And there are things that Volume has in common with degree of belief, which we might call probability like behavior. Again, if we found that degree of belief wasn’t modeled by probability theory, we would say that subjective bayesianism was wrong, not that probability theory does not really describe probability. If “aubjective belief” did mean probability instead, if we found that probability theory did not model ideally rational degree of belief, we would say that komolgorov’s axioms need to be fixed, they don’t really define probability.
That isn’t why there’s a frequentist/Bayesian dispute. Everyone agrees they are both “interpretations”. As another commenter has pointed out, the semantic argument is just a proxy for the dispute over whether one or other interpretation is preferable either philosophically or in practical terms.
Calling them interpretations seems to imply that at most one of them can be correct. “Displacement of a falling object on earth” and “kinetic energy of an 18.6 kg object” aren’t competing interpretations of the math f(x) = 9.8x^2, they’re just two different things the equation applies to.
If the frequentists are making any error, it’s denying that beliefs must be updated according to the Kolmogorov Axioms, not asserting that frequencies can also be treated with the same laws. It’s denying the former that might lead them to apply incorrect methods in inference, which is the only problem that really matters.
The definitional dispute about sound is different in that air pressure and auditory experience are both useful concepts, and there is no competition between them.
There is a dispute, ever hear of the idealists and the realists? Luckily it is over now. But either way. It does not matter why you are using one word to stand for many things, you shouldn’t do it if you can use a terminology that is more widely accepted. I still think that bayesianism is a better interpretation, a much better interpretation than frequentism, but what is it an interpretation of? Is it an interpretation of math? Seems to me like it as interpretation of typographical string manipulations applied to certain basic strings.
As another commenter has pointed out, the semantic argument is just a proxy for the dispute over whether one or other interpretation is preferable either philosophically or in practical terms.
That wasn’t another commenter, that was in my article, I’m pretty sure.
If people switched to saying that probability models both subjective degrees of belief and imaginary long-run frequency, there would still be this argument; however, it would then be harder for the Bayesian revolution (with whom the momentum lies) to finally oust the cursed frequentists, because language would be used in such a way as to imply equal validity of the interpretations.
If bayesianism wins this argument, which it probably will, it should win because it is the ideal system of statistical inference, not because they managed to convince a bunch of people of a statement with absolutely no empirical consequences. If you argue about what probability is you argue about surface bubbles of your theory that are just irrelevant to the real dispute you are having, whether you are a realist and an idealist, or a frequentist and a bayesian.
I think the interpretation of probability and what methods to use for inference are two separate debates. There was a really good discussion post on this a while back.
I’m also curious as to who exactly these frequentists are that you are arguing against. Perhaps I am spoiled by hanging out with people who regularly have to solve statistical problems, and therefore need to have a reasonable conception of statistics, but most frequentist sentiments that I encounter are fairly well-reasoned, sometimes even pointing out legitimate issues with Bayesian statistics. It is true that I sometimes get incorrect claims that I have to correct, but I don’t think becoming a Bayesian magically protects you from this.
EDIT: To clarify, the “frequentist sentiments” I referred to did not explicitly distinguish between interpretations of probability and inference algorithms, but as the goal was engineering I think the arguments were all implicitly pragmatic.
I think the interpretation of probability and what methods to use for inference are two separate debates. There was a really good discussion post on this a while back.
I completely agree with this. It seems to me that we should completely throw away the question of what probability is, and look at which form of inference is optimal.
I’m going by what I’ve read of Jaynes, Yudkowsky, and books by a couple of other writers on Bayesian statistics.
I don’t believe there are any legitimate issues with Bayesian statistics, because Bayes’s rule is derived from basic desiderata of rationality which I find entirely convincing, and it seems to me that the maximum entropy principle is the best computable approximation to Solomonoff induction (although I’d appreciate other opinions on that).
There may be legitimate issues with people failing to apply the simple mathematical laws of probability theory correctly, because the correct application can get very complicated—but that is not an issue with Bayesian statistics per se. I’m sure that in many cases, the wisest thing to do might be to use frequentist methods, but being a Bayesian does not prohibit someone from applying frequentist methods when they are a convenient approximation.
The two issues that come to mind are the difficulty of specifying priors and the computational infeasibility of performing Bayesian updates.
I don’t think anyone can reasonably dispute that if the correct prior is handed to you, together with a black box for applying Bayes’ rule, then you should perform Bayesian updates based on your data to get a posterior distribution. That is simply a mathematical theorem (Bayes’ theorem). And yes, it is also a theorem (Cox’s theorem) that any rational agent is implicitly using a prior. But we aren’t yet in a position to create a perfectly rational agent, and until we are, worrying about the specific form of consistency that is invoked for Cox’s theorem seems silly.
It’s possible that we don’t really disagree. As a purely abstract statement about what you should do given unlimited computational resources, sure, Solomonoff induction is the way to go. I definitely agree with that. But if you need to actually solve a specific practical problem, additional considerations come into play.
By the way, what do you mean by “the maximum entropy principle is the best computable approximation to Solomonoff induction”? That sounds intriguing, so I’d be interested to have you elaborate a bit.
Regarding frequentism vs. Bayesianity in practical applications, the message I take from Yudkowsky and Jaynes is that frequentists have tended historically to lack apprehension of the fact that their methods are ad-hoc, and in general they fail to use Bayesian power when it is in fact advisable to do so—whereas Bayesians feel they can use ad-hoc approximate methods or accurate methods, whichever is appropriate to the task. This is a case in which a questionable philosophy needn’t hamstring someone’s thinking in principle, but appears to do so fairly predictably as a matter of fact.
Incidentally I’m surprised that there appears to be so much disagreement about this, given that LW is basically a forum brought into existence on the strength of Yudkowsky’s abilities as a thinker, writer and populariser, and he clearly holds frequentism in contempt. It’s not necessarily a bad thing that some people here are sympathetic to frequentism—intellectual diversity is good—I’m just surprised that there are so many on a Bayesian rationality forum!
About Maxent: I had in mind chapter 5 of this book by Li and Vitanyi.
We can formulate scientific theories in two steps. First, we formulate a set of possible alternative hypotheses, based on scientific observations or other data. Second, we select one hypothesis as the most likely one. Statistics is the mathematics of how to do this. A relatively recent paradigm in statistical inference was developed by J.J. Rissanen and by C.S. Wallace and his coauthors. The method can be viewed as a computable approximation to the incomputable approach in Section 5.2 [i.e. Solomonoff induction] and was inspired by it. In accordance with Occam’s dictum, it tells us to go for the explanation that compresses the data the most. [...]
This is the MDL (minimum description length) principle.
The ideal MDL principle selects the hypothesis H that minimizes K(H) + K(D|H) [...]
Where K is Kolmogorov complexity.
Unfortunately, the function K is not computable (Section 3.4). For practical applications one must settle for easily computable approximations. [...]
So ideal MDL, like Solomonoff induction, is also incomputable!
They go on to discuss approximations, and on page 390 (I don’t know if you have a copy of the book) they provide a usable approximation to be referred to as “MDL”. Later on page 398 they discuss Maxent, and conclude that that too is an approximation to ideal MDL.
As far as I can see, Maxent is more useful in practical applications than their approximate MDL. I felt that Maxent needed to be defended, since Jaynes considered it to be a major element of Bayesian probability theory; and as far as I can see there is no clearly better practical method of generating priors at this point in time such that Maxent could be considered to be one of Bayesianity’s “legitimate issues” vis a vis frequentism.
Incidentally I’m surprised that there appears to be so much disagreement about this, given that LW is basically a forum brought into existence on the strength of Yudkowsky’s abilities as a thinker, writer and populariser, and he clearly holds frequentism in contempt. It’s not necessarily a bad thing that some people here are sympathetic to frequentism—intellectual diversity is good—I’m just surprised that there are so many on a Bayesian rationality forum!
My intuition here is that you are not observing so many people who are sympathetic to frequentism, so much as people who are unsympathetic to holding contempt.
In much of the comments here you seem to be missing a simple point about mathematics and reference due to its relationship to tribal signaling between the “Bayesians” and the “Frequentists”.
I’ve yet to see anything in this article, or the resulting comments thread, to suggest that the OP has anything to say apart from “let’s say ‘models’ instead of ‘is’ (but mean the same thing)”. And the only consequence of this is to puff up frequentism.
I tried (and apparently failed miserably) to make the case that in the interests of sanity, we should define our terms such that probability ≡ subjective degrees of belief. That’s all it is, a definition—there’s no philosophical significance to this “is” beyond that. It is not a claim that the frequency interpretation doesn’t fit Cox’s postulates—this is a naive interpretation of how language is used on the OP’s part.
The definitional dispute about sound is inapt, because there is nothing to be gained by defining sound as one thing or the other. In this case however there is a real benefit to defining our terms in one particular way.
I will however delete the downvoted posts in this thread, to honour the great disapproval with which my conception of rationality has apparently met in this case.
I will however delete the downvoted posts in this thread, to honour the great disapproval with which my conception of rationality has apparently met in this case.
Generally, deleting posts with responses is impolite, as the discussion may be helpful to future readers.
I tried (and apparently failed miserably) to make the case that in in the interests of sanity, we should define our terms such that probability ≡ subjective degrees of belief.
I don’t think you ever supplied a term other than “probability” that we should use for what the OP thought “probability” means. So we’re still left with three entities and two words.
I don’t think you ever supplied a term other than “probability” that we should use for what the OP thought “probability” means. So we’re still left with three entities and two words.
Seems like a non-problem. Just say “I am entering these frequencies into Bayes’s theorem”, “I am using the mathematical tools of probability theory” or something like that.
Or perhaps say “probability is a measure of subjectively objective degrees of belief”, and “probability theory is the set of mathematical tools used to compute probabilities, which can also be used to compute frequencies as the case may be”.
Which is pretty much what happens already! This is why I object to such an article—it’s a solution looking for a problem, which creates the illusion of a problem by a) being illiterate, so making itself hard to pin down b) nitpicking the use of words.
Generally, deleting posts with responses is impolite, as the discussion may be helpful to future readers.
They were also steadily generating an amount of negative karma days after posting that I felt was disproportionate, considering they were a sincere attempt to reach agreement with a less-than-articulate interlocutor.
They were also steadily generating an amount of negative karma days after posting that I felt was disproportionate, considering they were a sincere attempt to reach agreement with a less-than-articulate interlocutor.
They were also steadily generating an amount of negative karma days after posting that I felt was disproportionate, considering they were a sincere attempt to reach agreement with a less-than-articulate interlocutor.
I did not find User:potato less-than-articulate.
a) being illiterate, so making itself hard to pin down
I’m not sure what you mean by “illiterate” here, nor (thus) how it would make itself ‘hard to pin down’.
b) nitpicking the use of words.
The dispute was about the proper use of words. I did not see anything that looked like ‘nitpicking’ in that context.
The advantage of “Formalism” over “Bayesianism” or “Frequentism” is that it clearly marks the mathematical toolkit, makes it clear what Bayesians and Frequentists are separately talking about, gets rid of the slippage Frequentists allegedly make between “degrees of belief” and “frequencies”, and removes the question of what “probability” is “really” about, all without having to raise a flag in the mind-killing tribal warfare between “Bayesians” and “Frequentists”.
But then, it’s been noted that “a philosopher has never met a distinction he didn’t like”, so perhaps I’m just biased in favor of making clearer the distinction.
So in “formalism”, I understand that we are to say: “probability models frequency”, “probability models subjective degrees of belief” and “probability is the set of mathematical discoveries we have made, which deal with [ ], including such things as Bayes’s theorem”.
Whereas at the moment, Bayesians say: “probability is a measure of subjective degrees of belief”, “probability isn’t frequency”, and “probability theory is the set of mathematical discoveries we have made, which deal with probability, including such things as Bayes’s theorem”.
And frequentists say: “probability is long-run frequency”, “probability isn’t subjective degrees of belief”, and “probability theory is the set of mathematical discoveries we have made, which deal with probability, including such things as Bayes’s theorem”.
I like the Bayesian version. But the frequentist version doesn’t confuse me; I understand perfectly well that these are merely competing interpretations, and I’ve never felt the urge to argue specifically about whether probability is degrees of belief or is frequency—nor have I ever seen anyone else do so. Clearly that would be a stupid argument, just like the definitional dispute about sound. However, sensible people do use these terms, arguing about whether probability ‘is’ one or the other, as a proxy for a more substantive argument about which is the better—i.e. more philosophically parsimonious, and having better practical outcomes—interpretation. (Actually they are more likely to phrase the argument as “probability should be considered to be X”, and then say probability is X when they aren’t having the argument, but hey.)
As for the “formalist” version, firstly it puts the frequentist and Bayesian interpretations on a level footing. Even if sensible people were wasting time and effort arguing specifically over a mere definition, the cost of conceding ground to the problematic frequentist interpretation outweighs any benefit from ending that argument, in comparison to the option of simply carrying on using the language of the Bayesian.
Furthermore it appears to me that probability theory, given this use of language, lacks a referent. Probability theory has been renamed (simply) probability, and it no longer appears to be the theory of anything. Whether or not this use of language could be considered wrong per se, it hardly seems to be clearing up any philosophical confusion! If I ask “what is this thing that I am computing using Bayes’s theorem?”, the answer is no longer “the posterior probability”—if probability is the new word for the mathematical tools of probability theory, the phrase posterior probability no longer means anything. So perhaps I’ll have to invent a new word to refer to the same thing that the word probability used to refer to.
Do you begin to see why I think this is a waste of time?
NB: I think we’re making much more progress than I made with user:potato. That’s what I mean about the difficulty of having to argue with someone who is inarticulate, i.e. can’t state his case properly.
“probability is the set of mathematical discoveries we have made, which deal with [ ], including such things as Bayes’s theorem”.
Probably better put in terms of being a formal system, rather than “a set of mathematical discoveries”. But I fear that tends towards begging the question!
As for the “formalist” version, firstly it puts the frequentist and Bayesian interpretations on a level footing. Even if sensible people were wasting time and effort arguing specifically over a mere definition, the cost of conceding ground to the problematic frequentist interpretation outweighs any benefit from ending that argument, in comparison to the option of simply carrying on using the language of the Bayesian.
This treatment (notably the use of terms like “conceding ground”) suggests that you are engaging in a “political”/”debate” mode rather than a “truth-seeking” mode. This leads me to believe that we have more to lose by accepting the “Bayesian/Frequentist” duality than by dissolving it entirely and changing our terminology to match. This matches my impression of previous forays into the “Bayesian/Frequentist” ‘holy wars’.
If politics is mind-killing, then it must certainly be avoided even at great cost with respect to our most basic tools of rationality.
Do you begin to see why I think this is a waste of time?
Indeed, though in that case you’ve spent far more time on this than most who exercised the default ‘ignore’ option.
If I ask “what is this thing that I am computing using Bayes’s theorem?”, the answer is no longer “the posterior probability”—if probability is the new word for the mathematical tools of probability theory, the phrase posterior probability no longer means anything. So perhaps I’ll have to invent a new word to refer to the same thing that the word probability used to refer to.
A good point.
That’s what I mean about the difficulty of having to argue with someone who is inarticulate, i.e. can’t state his case properly.
I understood what you meant—I just did not see any inarticulateness on the part of User:potato.
I’ve never felt the urge to argue specifically about whether probability is degrees of belief or is frequency—nor have I ever seen anyone else do so.
I normally see this being explicitly the subject on Bayesian/Frequentist debates, and many long conversations with philosophers have revolved around whether “equating probability with subjective belief” is an “ontological confusion”.
This treatment (notably the use of terms like “conceding ground”) suggests that you are engaging in a “political”/”debate” mode rather than a “truth-seeking” mode.
Duly noted. I’ll try not to give this impression in future.
I normally see this being explicitly the subject on Bayesian/Frequentist debates, and many long conversations with philosophers have revolved around whether “equating probability with subjective belief” is an “ontological confusion”.
I may have simply failed to notice these arguments taking place. In order to dissolve any such ostensible ontological question, I’d recommend pointing out that to say probability is one or other thing is merely a statement to the effect that one interpretation is preferred for some reason by the writer—since both interpretations satisfy the Cox postulates or Kolmogorov axioms, we could define probability to be either subjective degrees of belief or long-run frequency, and make sound and rational inferences in either case (albeit perhaps not with the same efficiency). This should be enough to persuade an otherwise sensible person that he’s engaged in a futile argument about definitions.
Formalism attempts to solve the problem by effectively tabooing the concept of probability such that it no longer has a definition. Although we might be able to get around the problem that I mentioned by answering the question “”what is this thing that I am computing using Bayes’s theorem?” by saying “the posterior subjective degree of belief” or “the posterior frequency”, it’s easy to see how the same kind of philosophers would end up arguing over whether, in the case of a coin flip for example, we are really talking about prior and posterior subjective degrees of belief, or about prior and posterior long-run frequencies. And we would have lost the use of the word “probability”, which makes our messages shorter than they would otherwise be.
To the extent that there is such a thing as the proper use of words, to delete useful words from our vocabulary in order to (probably unsuccessfully) prevent people from having a definitional argument that could best be dispelled by introducing them to such notions as “dissolving the question” and reductionism isn’t it. On the other hand I’ll give user:potato credit for exposing an issue that may be more problematic than I at first believed.
I expect that we are substantially in agreement at this point.
FWIW, I think my three preferred terms are “Probabilities”, “Frequencies”, and “Normed Measure Theory”. That’s what Kolmogorov’s formalization amounts to anyway, and as the OP said it truly need not be connected to either probabilities or frequencies in a given use.
I don’t understand. Based on reading through the passages you referenced in PtLoS, maximum entropy is a way of choosing a distribution out of a family of distributions (which, by the way, is a frequentist technique, not a Bayesian one). Solomonoff induction is a choice of prior. I don’t really understand in what sense these are related to each other, or in what sense Maxent generates priors at all.
Incidentally I’m surprised that there appears to be so much disagreement about this, given that LW is basically a forum brought into existence on the strength of Yudkowsky’s abilities as a thinker, writer and populariser, and he clearly holds frequentism in contempt.
I’ve always felt that the frequentists that Eliezer argues against are straw men. As I said earlier, I’ve never met a frequentist who is guilty of the accusations that you keep making, although I have met Bayesians whose philosophy interfered with their ability to do good statistical modeling / inference. Have you actually run into the people who you seem to be arguing against? If not, then I think you should restrict yourself to arguing against opinions that people are actually trying to support, although I also think that whether or not some very foolish people happen to be frequentists is irrelevant to the discussion (something Eliezer himself discussed in the “Reversed Stupidity is not Intelligence” post).
If you know nothing about a variable except that it’s in the interval [a, b] your probability distribution must be from the class of distributions where p(x) = 0 for x outside of [a, b]. You pick the distribution of maximal entropy from this class as your prior, to encode ignorance of everything except that x ∈ [a,b].
That is one way Maxent may generate a prior, anyway.
I’m pretty sure almost all of freqeuntist methods are derivable as from bayes, or close approximations of bayes. Do they have any tool which is radically un-bayesian?
See paulfchristiano’s examples elsewhere in this thread.
Another example would be support vector machines, which work really well in practice but aren’t Bayesian (although it’s possible that they are actually Bayesian and I just can’t figure out what prior they correspond to).
There are also neural networks, which are sort of Bayesian but (I think?) not really. I’m not actually that familiar with neural nets (or SVMs for that matter) so I could just be wrong.
ETA: It is the case that every non-dominated decision procedure is either a Bayesian procedure or the limit of Bayesian procedures (which I think could alternately be thought of as a Bayesian procedure with a potentially improper prior). So in that sense, for any frequentist procedure that is not Bayesian, there is another procedure that gets higher expected utility in all possible worlds, and is therefore strictly better. The only problem is that this is again an abstract statement about decision procedures, and doesn’t take into account the computational difficulty of actually finding the better procedure.
This paper is the closest I’ve ever seen to a fully Bayesian interpretation of SVMs; mind you, the authors still use “pseudo-likelihood” to describe the data-dependent part of the optimization criterion.
Neural networks are just a kind of non-linear model. You can perform Bayes upon them if you want.
That is my understanding, too. Frequentists claim not to have priors, but in fact they just use uninformative priors implicitly. In a more fundamental sense, if they genuinely had no priors then they would be unable even to interpret the results of an experiment.
The definitional dispute about sound is different in that air pressure and auditory experience are both useful concepts, and there is no competition between them.
There is a dispute, ever hear of the idealists and the realists? Luckily it is over now. But either way. It does not matter why you are using one word to stand for many things, you shouldn’t do it if you can use a terminology that is more widely accepted. I still think that bayesianism is a better interpretation, a much better interpretation than frequentism, but what is it an interpretation of? Is it an interpretation of math? Seems to me like it as interpretation of typographical string manipulations applied to certain basic strings.
As another commenter has pointed out, the semantic argument is just a proxy for the dispute over whether one or other interpretation is preferable either philosophically or in practical terms.
That wasn’t another commenter, that was in my article, I’m pretty sure.
If people switched to saying that probability models both subjective degrees of belief and imaginary long-run frequency, there would still be this argument; however, it would then be harder for the Bayesian revolution (with whom the momentum lies) to finally oust the cursed frequentists, because language would be used in such a way as to imply equal validity of the interpretations.
If bayesianism wins this argument, which it probably will, it should win because it is the ideal system of statistical inference, not because they managed to convince a bunch of people of a statement with absolutely no empirical consequences. If you argue about what probability is you argue about surface bubbles of your theory that are just irrelevant to the real dispute you are having, whether you are a realist and an idealist, or a frequentist and a bayesian.
Probability theory is maths, and although I agree that questions like “where is maths?” and “what is maths?” and “does maths exist?” are confusing
See, these questions are not confusing to me at all. Hofstadter’s formalism deals with them perfectly. Have you ever read G.E.B.? I assumed so, but I wasn’t sure, maybe you haven’t.
Yes, I do think that probability theory is a repeatable process of typographical string manipulations. What do you think it is?
Ultimately language should be useful, and I don’t see the point of changing the word “is” to the word “models”. This wouldn’t change my beliefs about probability theory; I’d just be using the word “models” to mean the same thing as the word “is”. And I would then lack a means of saying that the Bayesian interpretation of probability is good, and the frequentist interpretation is stupid and counter-productive – I want to be able to say that probability is X, and probability isn’t Y, because this is the most useful way of using language to talk about probability theory – why would I want to put the good interpretation and the dumb interpretation on an equal footing by saying that probability “models” X and also “models” Y?
Here I completely disagree, and almost wonder if you haven’t been reading my comments. Bayesianism is stronger, more capable, perfecter, stronger, more rational, more useful than frequentism, first of all, and all of that has nothing to do with the commitment to conceptualism that subjective bayes requires. This is all still true if you are a formalist.
Bayesianism is not righter than frequentism because probabilities are really subjective beliefs, and the frequentists were wrong, it’s not frequency. Bayesains are righter than frequentists because bayes-inferences are deductively demonstrable to win more than frequentist-inferences. Again, the argument about what probability really is is just a way to disguise the argument about who’s statistical method is more successful, the only way the frequentist even has a shot at such an argument if it is disguised as a question about what probability is instead of a question about who’s inferences are theoretically ideal.
So um, platonism? Really? Why? What does it get you that formalism doesn’t with less ontological commitment?
The frequentist/Bayesian dispute is of real import, because ad-hoc frequentist statistical methods often break down in extreme cases, throw away useful data, only work well with Gaussian sampling distributions etc.
I think you have this backwards. Frequentist techniques typically come with adversarial guarantees (i.e., “as long as the underlying distribution has bounded variance, this method will work”), whereas Bayesian techniques, by choosing a specific prior (such as a Gaussian prior), are making an assumption that will hurt them in an extreme cases or when the data is not drawn from the prior. The tradeoff is that frequentist methods tend to be much more conservative as a result (requiring more data to come to the same conclusion).
If you have a reasonable Bayesian generative model, then using it will probably give you better results with less data. But if you really can’t even build the model (i.e. specify a prior that you trust) then frequentist techniques might actually be appropriate. Note that the distinction I’m drawing is between Bayesian and frequentist techniques, as opposed to Bayesian and frequentist interpretations of probability. In the former case, there are actual reasons to use both. In the latter case, I agree with you that the Bayesian interpretation is obviously correct.
Bayesian techniques, by choosing a specific prior (such as a Gaussian prior), are making an assumption that will hurt them in an extreme cases or when the data is not drawn from the prior. The tradeoff is that frequentist methods tend to be much more conservative as a result (requiring more data to come to the same conclusion).
Bayesian methods with uninformative (possibly improper) priors agree with frequentist methods whenever the latter make sense.
Can you explain further? Casually, I consider results like compressed sensing and multiplicative weights to be examples of frequentist approaches (as do people working in these areas), which achieve their results in adversarial settings where no prior is available. I would be interested in seeing how Bayesian methods with improper priors recommend similar behavior.
I let you choose some linear functionals, and then tell you the value of each one on some unknown sparse vector (compressed sensing).
We play an iterated game with unknown payoffs; you observe your payoff in each round, but nothing more, and want to maximize total payoff (multiplicative weights).
Put even more simply, what is the Bayesian method that plays randomly in rock-paper-scissors against an unknown adversary? Minimax play seems like a canonical example of a frequentist method; if you have any fixed model of your adversary you might as well play deterministically (at least if you are doing consequentialist loss minimization).
Are you referring to the result that every non-dominated decision procedure is either a Bayesian procedure or a limit of Bayesian procedures? If so, one could imagine a frequentist procedure that is strictly dominated by other procedures, but where finding the dominating procedures is computationally infeasible. Alternately, a procedure could be non-dominated, and thus Bayesian for the right choice of prior, but the correct choice of prior could be difficult to find (the only proof I know of the “non-dominated ⇒ Bayesian” result is non-constructive).
What I was trying to emphasise is that, pace “potato”, the frequentist/Bayesian dispute isn’t just an argument about words but actually has ramifications for how one is likely to approach statistical inference—so it shouldn’t be compared to the definitional dispute “If a tree falls in a forest and no one hears it, does it make a sound?”
If someone treated frequentist approaches as though they were equivalent to Bayesian methods in general, then he would occasionally be drastically in error. PT:TLoS offers many examples of this (for example the comparison of a Bayesian “psi test” and the chi-squared test on page 300). My comment about the Gaussian distribution had in mind Jaynes’s discussion of “pre-data and post-data considerations” starting on page 499, in which he discusses the fact that orthodox practice answers the wrong question: it gives correct answers to “if the hypothesis being tested is in fact true, what is the probability that we shall get data indicating that it is true?” when the real problems of scientific inference are concerned with the question “what is the probability conditional on the data that the hypothesis is true?”, and this problem is the result of frequentist philosophy’s failure to admit the existence of prior and posterior probabilities for a fixed parameter or an hypothesis. He suggests that this conflation goes somewhat unnoticed because in the case of the commonly encountered Gaussian sampling distribution the difference is relatively unimportant, but compares another case (Cauchy sampling distributions) in which the Bayesian analysis is far superior.
On the other hand the interlocutors in the standard definitional dispute have no substantive disagreement, i.e. they actually anticipate the same things, so their disagreement amounts to nothing apart from the fact that they waste their time arguing about words.
I’ll defer to your opinion (which is probably much better informed than mine) on whether frequentist methods work well when their limitations are borne in mind.
Why can’t a frequentist say: “Bayesians are conflating probability with subjective degree of belief.” ? They were here first after all.
Probability does model frequency, and it does model subjective degree of believe, and this is not a contradiction. Using the copula is the problem, obviously: if subjective degree of believe is not frequency, and probability is frequency, then probability is not subjective degree of belief. Analogously, if subjective degree of believe is not frequency, and probability is subjective degree of belief, then probability is not frequency.
The problem is that they all conflate “probability” with “subjective degree of belief” and “frequency”, the bayesian conflates subjective degree of belief and probability. The frequentist conflates probability and frequency.
The frequentist/Bayesian dispute is of real import, because ad-hoc frequentist statistical methods often break down in extreme cases, throw away useful data, only work well with Gaussian sampling distributions etc.
The debate over whether to use Bayesian methods or frequentest methods is of import. I think potato was trying to say this here:
How we should actually model the situation as a probability distribution depends on our goal. But remember that Bayesianism is the stronger magic.
But the question of whether probability is frequency, or if probability is subjective degree of belief, is just as silly as a dispute over whether numbers are quantity, or if they are orders. The answer is that numbers model both, and are neither.
Probability “models” frequency in the sense that sometimes frequency data dominates all of our other knowledge about some phenomenon.
No, probability models frequency in the sense that there is an interpretation of komologorov which only mentions terms from the part of our language used to talk about frequency, and all komologorov theorems come out as true statements about frequency under this interpretation.
I mean, literally, Bayes is an arithmetic of odds and fractions, of course it models frequency. At least as well as fractions and odds do. Probability is a frequency as often as it is a fraction or an odds.
So you are saying that as long as frequentists understand that Bayesian methods are theoretically ideal and cannot be improved upon, whereas frequentist methods may be useful approximations, they shouldn’t run into any real life mistakes.
This is nearly true, were it not for the fact that frequentists don’t actually believe this.
They don’t, but could and should.
If someone manages simultaneously to believe that frequentist philosophy (probability ≡ frequency) is sound, yet frequentist methods are fallible ad-hoc methods and Bayesian methods provide the best inferences possible given our state of knowledge, then he is performing quite a feat.
I agree, this is why instead of saying that probability is identical to frequency, or that it is frequency, we should say that it models frequency.
What is this comment supposed to add? Is it an ad hominem, or are you asking for clarification? If you don’t understand that comment perhaps you should try rereading my original post, I have updated it a bit since you first commented, perhaps it is clearer.
(edit) clarification:
The reason that probabilities model frequency is not because our data about some phenomena are dominated by facts of frequency. If you take 10 chips, 6 of them red, 4 of them blue, 5 red ones and 1 blue one on the table, and the rest not on the table, you’ll find that bayes can be used to talk about the frequencies of these predicates in the population. You only need to start with theorems that when interpreted produce the assumptions I just provided, e.g., P(red and on the table) = 1⁄2, P(~red and on the table) = 1⁄10, P(red and ~on the table) = 2⁄5. From those basic statements we can infer using bayes all the following results: P(red|on the table) = 5⁄6, P(~red|on the table) = 1⁄6, P( (red and on the table) or blue) = 9⁄10, P(red) = P(red|on the table) P(on the table) + P(red|~on the table) P(~on the table) = 6⁄10, etc. These are all facts about the FREQUENCY distributions of these chips’ predicates, which can be reached using bayes, and the assumptions above. We can interpret P(red) as the frequency of red chips out of all the chips, and P(red|on the table) as the frequency of red chips out of chips on the table. You’ll find that anything you proof about these frequencies using bayesian inference will be true claims about the frequencies of these predicates within the chips. Hence, bayes models frequency. This is all I meant by bayes models frequency. You’ll also find that it works just as well with volume or area. (I am sorry I wasn’t that concrete to begin with.)
In the same exact way, you can interpret probability theorems as talking about degrees of belief, and if you ask a bayesian, all those interpreted theorems will come out as true statements about rational degree of belief. In this way bayes models rational belief. You can also interpret probability theory as talking about boston’s night life, but not everyone of those interpreted theorems will be true, so probabiliity theory does not model boston’s night life under that interpretation. To model something, means to produce only true statements under a given interpretation about that something.
Frequentists may not treat their tool box as a set of mostly unrelated approximations to perfect learning, or treat bayes as the optimal laws of inference, but they should as far as I can tell. And if they did, they would not cease to be frequentists, they would still use the same methods, use “probability” the same way, and still focus on long run frequency over evidential support. The only difference is that rather than saying probability is frequency and that probability is not subjective degree of belief, they would say that probability models both frequency and subjective degree of belief. Subjective bayesians should make a similar update, though I am sure they don’t swing the copula around as liberally as frequentists. This is what i meant when i said that frequentists could and should believe that frequentism is just a useful approximation, and that bayes is in some sense optimal. I was never really arguing about the practical advantages of bayesianism over frequntism, but about how they both seem to make a similar philosophical mistake in using identity or the copula when the relation of modeling is more applicable. A properly Hofstadterish formalism seems like the best way to deal with all of this comprehensively.
You understand what I was saying now? I really want to know. That you are confused by what seem to me to be my most basic claims, and that you are also as familiar with E. T. Jaynes as your comments suggests is worrying to me. Does this clarification make you less confused?
Fine, let’s make up a new frequentism, which is probably already in existence: finite frequentism. Bayes still models finite frequencies, like the example i gave of the chips.
When a normal frequentest would say “as the number of trials goes to infinity” the finite frequentest can say “on average” or “the expectation of”. Rather than saying, as the number of die rolls goes to infinity the fraction of sixes is 1⁄6, we can just say that as the number rises it stabilizes around and gets closer to 1⁄6. That is a fact which is finitely verifiable. If we saw that the more die rolls we added to the average, the closer the fraction of sixes approached 1⁄2, and the closer it hovered around 1⁄2, the frequentest claim would be falsified.
There may be no infinite populations. But the frequentist can still make due with finite frequencies and expected frequencies, and i am not sure what he would loose. There are certainly finite frequencies in the world, and average frequencies are at least empirically testable. What can the frequentist do with infinite populations or trials, that he/she can’t do with expected/average frequencies.
Also, are you a finitist when it comes to calculus? Because the differential calculus requires much more commitment to the idea of a limit, infinity, and the infinitesimal, than frequentists require, if frequentests require these concepts at all. Would you find a finitist interpretation of the calculus to be more philosophically sound than the classical approach?
Except one makes it seem like the stuff does exist and the other makes it seem like it doesn’t. If we interpret the law of large numbers as saying that after an infinite number of trials, the average value of that sequence of results will equal the expected value of the random variable, then any finite amount of evidence is not enough to be evidence of this interpretation, let alone to verify it. But if we interpret the law as saying that the more trials we add, the more closely the average result should hover around the expected value of the variable. That interpretation can be falsified and evidenced empirically using only finite observations.
Ok, I am a Bayesian, i.e., I use bayesianism over frequentism, and find frequentest methods rather silly. And I am at least what I would call a finite frequentist, i.e., I think komolgorov models finite frequency.
I am not here to say that Bayesianism is on equal ground with Frequentism, at all. If Bayes’s interpreted sentences can be empirically verified, and freqeuntist interpretations cannot be empirically verified, than this is to bayesianism’s favor. It means it is the more useful theory. But it is not grounds to use “is” where we should use “models” instead. It is not because bayesains need to be put on equal footing with frequentists that I propose this terminology in place of the copula; it is because rationalists should be clear, specially in philosophy. So just to be clear, we should use “model” instead of “is” because it is what is really going on; the concepts of Hofstadterish formalism and model theory, are the best way to understand how probability theory ends up telling us how to distribute beliefs. The relation between subjective degree of belief, and probability theory, is clearly not identity, or the copula.
Subjective degrees of belief are a part of your cognition.
Probability theory is a repeatable process consisting in shuffling squiggles on paper, or some other medium.
These are not identical.
Q.e.d.
We might call this the “paper projection fallacy”. Where you project some pattern in squiggles on a piece of paper into your mind. Analogous to the “mind projection fallacy”.
Bayesian probability theory is the mathematical formulation representing ideal reasoning under uncertainty
The squiggles on the paper are our representation of this probability—they are “probability”, not probability, if you like.
no, probability, or I assume you really mean rationality, is the void
Bayes is just playing with squiggles on paper. If when you interpreted bayes, you found some claim, which seemed to not work, you would have to abandon bayes, or be irrational.
The squiggles on the paper are our representation of this probability.
What probability where? If you start by saying degree of belief is probability, and then show that degreeo f belief is probability, I am not impressed. You can call them “the representation” instead of calling them “the theory” if you want. And you can use “is” instead of model if you want. And you can even use “probability” instead of “degree of belief”, though I suspect that may all get rather confusing quickly. But do realize that every reason you give for saying that probability is degree of belief, a frequentest can give for saying that probability is frequency.
“Probability” is a really stupid noun, kind of like “red-hood”, or “emergence”. Notice how in the actual theory, we only ever talk about the probability of something. “Probability” is a function, not an object. Ask yourself: “what IS probability?” really probe, and you’ll find that that is a stupid question. The right question would have been, “what does probability return given an argument?” The answer is that it might return the rational degree of belief of a proposition, the frequency of a predicate out of a finite population, the frequency of a value out of an infinite amount of trials, the volume of a space, the area of a shape, or even the length of a line. All of these are consistent with the komologorov axioms.
This assumes an “expected value” which could only be known by some other means, i.e. accepting the Bayesian notion of probability as subjective degrees of belief, or supposing an infinite number of trials. Such a definition of frequentism begs the question.
Well it is actually in the bartender’s premise that the coin is biased, so they both know that whatever heads/trials hovers around as trials rises, it is not 1⁄2.
But assuming they didn’t have that premise, what could the frequentist do, without requiring non-empirically verifiable claims as assumptions?
Only thing i can think of: He/she could resort to ranges. Never actually defining the probability of heads, just determining the probability with which the actual probability i.e. frequency of heads, is within a given range. There is some ideal actual frequency, which would be the outcome given infinitely many trials, but you can only find a range within which it is, and it would require infinite amounts of evidence to constrain heads/trials to a point; and we don’t have that kind of time. Bayes can be extended to ranges of probability trivially. THis would make it so that finite observables act as evidence for some hypotheses which include the term “infinity”. But it wouldn’t justify the whole of frequentest methodology.
But again, even if the frequentest interpretation fails in ways which the bayesian interpretation does not, this is not evidence of probability being degree of belief. It is evidence of probability modeling degree of belief, and of Bayesianism having sounder ontological commitments than frequentism. This would not surprise me.
Infinite frequency is not real. But our intuitions about it are real. Komolgorov may then be said to model actual finite frequencies, and our intuitions about infinite frequencies which are finitely and axiomatically describable. Let us not forget that there are not circles or squares anywhere either, but we should still hold that you can’t square the circle. Not all models have to be out there, some may be in here Frequentism requires infinite frequencies for its interpretation to be true, which don’t exist. The subjective bayes interpretation of bayes does not require anything that really doesn’t exist (though degrees of belief are plenty mysterious). This is a good reason to be a subjective Bayesian, and not a frequentest, which I was not aware of consciously, but it is not a good reason to stop being a formalist.
Who cares if frequentists, or non-LW bayesians, use the copula like a bunch of sillies, even after G.E.B. is published. We LWers, should use “identity” if we are claiming identity, and “modeling” if we are claiming a model. But realistically, the claim that “Probability theory models rational belief systems.” seems much more defensible, concrete, and useful, than the claim that “Probability is degree of belief.”
.
It seems to me that I am the one proposing a sharp distinction between probability theory (the representation), and rational degree of belief (the referent). If you say that probability is degree of belief, you destroy all the distinction between the model and the modeled. If by “probability” you mean subjective degree of belief, I don’t really care what you call it. But know that “probability” has been used in ways which are not consistent with that synonymy claim. By the fact that we do not have 100% belief that bayes does model ideal inference with uncertainty, this means that bayesian probability is not identical to subjective belief given out knowledge. If X is identical to Y, then X is isomorphic-to/models Y. Because we can still conceive of bayes not perfectly modeling rationality, without implying a contradiction, this means that our current state of knowledge does not include that bayes is identical to subjective degree of belief.
We learn that something is probability by looking at probability theory, not by looking at subjective belief. If rational subjective belief turned out to not be modeled by probability theory, then we would say that subjective degree of belief was not like probability, not that probability theory does not define probability.
The first person to make bayes, may have been thinking about rationality when he/she first created the system, or he/she may have been thinking about spatial measurements, or he/she may have been thinking about finite frequencies, and he/she would have made the same formal system in every case. Their interpretations would have been different, but they would all be the one identical probability theory. Which one the actual creator was thinking of, is irrelevant. What spaces, beliefs, finite frequencies all have in common is that they are modeled by probability theory. To use “probability” to refer to one of these, over another, is a completely arbitrary choice (mind you I said finite frequency).
If we loose nothing by using “models” instead of “is”, why would we ever use “is”? “Is’ is a much stronger claim than “models”. And frankly, I know how to check whether or not a given argument is an animal, for instance; how do I check if a given argument is a probability? I see if it satisifies the probability axioms. Finite frequency, measure, and rational degree of belief all seem to follow the probability axioms and inferences under specific, though similar, interpretations of probability theory.
.
Only models? Just squiggles on paper?
You’ve misunderstood the article, I think. Probability theory (the Kolmogorov Axioms) does model correct degrees of belief and describes normatively what they should be. It also models “long-term frequencies” in the sense that the Kolmogorov Axioms also apply to such things.
None of this requires the word “probability” to refer to degrees of belief. You don’t even need a word at all to do the math and get the right answer. It’s convenient to use the word that way though, since we already have a word “frequency” that refers to the stupider idea.
(And also I suspect that most people learned the word at school mostly by being given examples of likely and unlikely things. For them, “probability” refers to the little progress bar in their mind that goes up for more likely things and down for less likely things [ie. degrees of belief]. And thus many frequentists may commit philosophical errors when they try to define it as frequencies then use the intuitive definition to draw a conclusion in the same argument. This alone is a good reason to use “probability” for beliefs and “frequencies” for, well, frequencies.)
Yes, we can use “probability is degree of belief” but we have to be very careful about this sort of word play, because what that really means is that “probability models degree of belief”.
Probability doesn’t come from attempting to model something out in the world. It comes from attempting to find a measure of degree of belief that’s consistent with certain desiderata, like “you shouldn’t believe both a thing and its opposite.” So the phrase “probability models degree of belief” is false.
You’re riht, I mean to say “probability theory models theoretically optimal degree of belief updates, gven other degrees of belief”. Or “probability theory models ideally rational degrees of belief.”
.
Because then you’ll keep arguing for decades about which one it really is, to absolutely no fruitful conclusion. Why not just keep saying that sound is air pressure and not auditory experience, or vice versa? When you do that, it makes it harder to see what is really going on. Call me conservative, but I think we should use as precise of a terminology as possible. Also, it seems to me that “probability is degree of belief” is an unverifiable claim, or I at least do not know what experiences I should test it with. But really, even in your own writing you don’t feel comfortable using the copula as the relation between probability and degree of belief without italicizing it, doesn’t that make you think that maybe there is a better word for the relation which you wouldn’t feel like you need to italicize? How about “models”? And really we shouldn’t be using probability as a noun, it’s a function not an object, but we can deal with that later.
Exactly what about my article suggests that we should change our terminology to legitimize frequentism? I am saying that frequentism and subjective bayesianism both fail the moment they use the copula with probability as the subject, that is a stupid thing to do in philosophy. It’s as bad as hegel. “Probability” is not a noun, it is a function, it is syncategorematic like “the”, “or”, “sake”, etc. it is not categorematic; “probability” does not have a physical extension. And there are things that Volume has in common with degree of belief, which we might call probability like behavior. Again, if we found that degree of belief wasn’t modeled by probability theory, we would say that subjective bayesianism was wrong, not that probability theory does not really describe probability. If “aubjective belief” did mean probability instead, if we found that probability theory did not model ideally rational degree of belief, we would say that komolgorov’s axioms need to be fixed, they don’t really define probability.
.
Calling them interpretations seems to imply that at most one of them can be correct. “Displacement of a falling object on earth” and “kinetic energy of an 18.6 kg object” aren’t competing interpretations of the math
f(x) = 9.8x^2
, they’re just two different things the equation applies to.If the frequentists are making any error, it’s denying that beliefs must be updated according to the Kolmogorov Axioms, not asserting that frequencies can also be treated with the same laws. It’s denying the former that might lead them to apply incorrect methods in inference, which is the only problem that really matters.
There is a dispute, ever hear of the idealists and the realists? Luckily it is over now. But either way. It does not matter why you are using one word to stand for many things, you shouldn’t do it if you can use a terminology that is more widely accepted. I still think that bayesianism is a better interpretation, a much better interpretation than frequentism, but what is it an interpretation of? Is it an interpretation of math? Seems to me like it as interpretation of typographical string manipulations applied to certain basic strings.
That wasn’t another commenter, that was in my article, I’m pretty sure.
If bayesianism wins this argument, which it probably will, it should win because it is the ideal system of statistical inference, not because they managed to convince a bunch of people of a statement with absolutely no empirical consequences. If you argue about what probability is you argue about surface bubbles of your theory that are just irrelevant to the real dispute you are having, whether you are a realist and an idealist, or a frequentist and a bayesian.
[Deleted]
I think the interpretation of probability and what methods to use for inference are two separate debates. There was a really good discussion post on this a while back.
I’m also curious as to who exactly these frequentists are that you are arguing against. Perhaps I am spoiled by hanging out with people who regularly have to solve statistical problems, and therefore need to have a reasonable conception of statistics, but most frequentist sentiments that I encounter are fairly well-reasoned, sometimes even pointing out legitimate issues with Bayesian statistics. It is true that I sometimes get incorrect claims that I have to correct, but I don’t think becoming a Bayesian magically protects you from this.
EDIT: To clarify, the “frequentist sentiments” I referred to did not explicitly distinguish between interpretations of probability and inference algorithms, but as the goal was engineering I think the arguments were all implicitly pragmatic.
I completely agree with this. It seems to me that we should completely throw away the question of what probability is, and look at which form of inference is optimal.
I’m going by what I’ve read of Jaynes, Yudkowsky, and books by a couple of other writers on Bayesian statistics.
I don’t believe there are any legitimate issues with Bayesian statistics, because Bayes’s rule is derived from basic desiderata of rationality which I find entirely convincing, and it seems to me that the maximum entropy principle is the best computable approximation to Solomonoff induction (although I’d appreciate other opinions on that).
There may be legitimate issues with people failing to apply the simple mathematical laws of probability theory correctly, because the correct application can get very complicated—but that is not an issue with Bayesian statistics per se. I’m sure that in many cases, the wisest thing to do might be to use frequentist methods, but being a Bayesian does not prohibit someone from applying frequentist methods when they are a convenient approximation.
The two issues that come to mind are the difficulty of specifying priors and the computational infeasibility of performing Bayesian updates.
I don’t think anyone can reasonably dispute that if the correct prior is handed to you, together with a black box for applying Bayes’ rule, then you should perform Bayesian updates based on your data to get a posterior distribution. That is simply a mathematical theorem (Bayes’ theorem). And yes, it is also a theorem (Cox’s theorem) that any rational agent is implicitly using a prior. But we aren’t yet in a position to create a perfectly rational agent, and until we are, worrying about the specific form of consistency that is invoked for Cox’s theorem seems silly.
It’s possible that we don’t really disagree. As a purely abstract statement about what you should do given unlimited computational resources, sure, Solomonoff induction is the way to go. I definitely agree with that. But if you need to actually solve a specific practical problem, additional considerations come into play.
By the way, what do you mean by “the maximum entropy principle is the best computable approximation to Solomonoff induction”? That sounds intriguing, so I’d be interested to have you elaborate a bit.
Regarding frequentism vs. Bayesianity in practical applications, the message I take from Yudkowsky and Jaynes is that frequentists have tended historically to lack apprehension of the fact that their methods are ad-hoc, and in general they fail to use Bayesian power when it is in fact advisable to do so—whereas Bayesians feel they can use ad-hoc approximate methods or accurate methods, whichever is appropriate to the task. This is a case in which a questionable philosophy needn’t hamstring someone’s thinking in principle, but appears to do so fairly predictably as a matter of fact.
Incidentally I’m surprised that there appears to be so much disagreement about this, given that LW is basically a forum brought into existence on the strength of Yudkowsky’s abilities as a thinker, writer and populariser, and he clearly holds frequentism in contempt. It’s not necessarily a bad thing that some people here are sympathetic to frequentism—intellectual diversity is good—I’m just surprised that there are so many on a Bayesian rationality forum!
About Maxent: I had in mind chapter 5 of this book by Li and Vitanyi.
This is the MDL (minimum description length) principle.
Where K is Kolmogorov complexity.
So ideal MDL, like Solomonoff induction, is also incomputable!
They go on to discuss approximations, and on page 390 (I don’t know if you have a copy of the book) they provide a usable approximation to be referred to as “MDL”. Later on page 398 they discuss Maxent, and conclude that that too is an approximation to ideal MDL.
As far as I can see, Maxent is more useful in practical applications than their approximate MDL. I felt that Maxent needed to be defended, since Jaynes considered it to be a major element of Bayesian probability theory; and as far as I can see there is no clearly better practical method of generating priors at this point in time such that Maxent could be considered to be one of Bayesianity’s “legitimate issues” vis a vis frequentism.
My intuition here is that you are not observing so many people who are sympathetic to frequentism, so much as people who are unsympathetic to holding contempt.
In much of the comments here you seem to be missing a simple point about mathematics and reference due to its relationship to tribal signaling between the “Bayesians” and the “Frequentists”.
I’ve yet to see anything in this article, or the resulting comments thread, to suggest that the OP has anything to say apart from “let’s say ‘models’ instead of ‘is’ (but mean the same thing)”. And the only consequence of this is to puff up frequentism.
I tried (and apparently failed miserably) to make the case that in the interests of sanity, we should define our terms such that probability ≡ subjective degrees of belief. That’s all it is, a definition—there’s no philosophical significance to this “is” beyond that. It is not a claim that the frequency interpretation doesn’t fit Cox’s postulates—this is a naive interpretation of how language is used on the OP’s part.
The definitional dispute about sound is inapt, because there is nothing to be gained by defining sound as one thing or the other. In this case however there is a real benefit to defining our terms in one particular way.
I will however delete the downvoted posts in this thread, to honour the great disapproval with which my conception of rationality has apparently met in this case.
Generally, deleting posts with responses is impolite, as the discussion may be helpful to future readers.
I don’t think you ever supplied a term other than “probability” that we should use for what the OP thought “probability” means. So we’re still left with three entities and two words.
Seems like a non-problem. Just say “I am entering these frequencies into Bayes’s theorem”, “I am using the mathematical tools of probability theory” or something like that.
Or perhaps say “probability is a measure of subjectively objective degrees of belief”, and “probability theory is the set of mathematical tools used to compute probabilities, which can also be used to compute frequencies as the case may be”.
Which is pretty much what happens already! This is why I object to such an article—it’s a solution looking for a problem, which creates the illusion of a problem by a) being illiterate, so making itself hard to pin down b) nitpicking the use of words.
They were also steadily generating an amount of negative karma days after posting that I felt was disproportionate, considering they were a sincere attempt to reach agreement with a less-than-articulate interlocutor.
Would not retraction have served?
I did not find User:potato less-than-articulate.
I’m not sure what you mean by “illiterate” here, nor (thus) how it would make itself ‘hard to pin down’.
The dispute was about the proper use of words. I did not see anything that looked like ‘nitpicking’ in that context.
The advantage of “Formalism” over “Bayesianism” or “Frequentism” is that it clearly marks the mathematical toolkit, makes it clear what Bayesians and Frequentists are separately talking about, gets rid of the slippage Frequentists allegedly make between “degrees of belief” and “frequencies”, and removes the question of what “probability” is “really” about, all without having to raise a flag in the mind-killing tribal warfare between “Bayesians” and “Frequentists”.
But then, it’s been noted that “a philosopher has never met a distinction he didn’t like”, so perhaps I’m just biased in favor of making clearer the distinction.
So in “formalism”, I understand that we are to say: “probability models frequency”, “probability models subjective degrees of belief” and “probability is the set of mathematical discoveries we have made, which deal with [ ], including such things as Bayes’s theorem”.
Whereas at the moment, Bayesians say: “probability is a measure of subjective degrees of belief”, “probability isn’t frequency”, and “probability theory is the set of mathematical discoveries we have made, which deal with probability, including such things as Bayes’s theorem”.
And frequentists say: “probability is long-run frequency”, “probability isn’t subjective degrees of belief”, and “probability theory is the set of mathematical discoveries we have made, which deal with probability, including such things as Bayes’s theorem”.
I like the Bayesian version. But the frequentist version doesn’t confuse me; I understand perfectly well that these are merely competing interpretations, and I’ve never felt the urge to argue specifically about whether probability is degrees of belief or is frequency—nor have I ever seen anyone else do so. Clearly that would be a stupid argument, just like the definitional dispute about sound. However, sensible people do use these terms, arguing about whether probability ‘is’ one or the other, as a proxy for a more substantive argument about which is the better—i.e. more philosophically parsimonious, and having better practical outcomes—interpretation. (Actually they are more likely to phrase the argument as “probability should be considered to be X”, and then say probability is X when they aren’t having the argument, but hey.)
As for the “formalist” version, firstly it puts the frequentist and Bayesian interpretations on a level footing. Even if sensible people were wasting time and effort arguing specifically over a mere definition, the cost of conceding ground to the problematic frequentist interpretation outweighs any benefit from ending that argument, in comparison to the option of simply carrying on using the language of the Bayesian.
Furthermore it appears to me that probability theory, given this use of language, lacks a referent. Probability theory has been renamed (simply) probability, and it no longer appears to be the theory of anything. Whether or not this use of language could be considered wrong per se, it hardly seems to be clearing up any philosophical confusion! If I ask “what is this thing that I am computing using Bayes’s theorem?”, the answer is no longer “the posterior probability”—if probability is the new word for the mathematical tools of probability theory, the phrase posterior probability no longer means anything. So perhaps I’ll have to invent a new word to refer to the same thing that the word probability used to refer to.
Do you begin to see why I think this is a waste of time?
NB: I think we’re making much more progress than I made with user:potato. That’s what I mean about the difficulty of having to argue with someone who is inarticulate, i.e. can’t state his case properly.
Probably better put in terms of being a formal system, rather than “a set of mathematical discoveries”. But I fear that tends towards begging the question!
This treatment (notably the use of terms like “conceding ground”) suggests that you are engaging in a “political”/”debate” mode rather than a “truth-seeking” mode. This leads me to believe that we have more to lose by accepting the “Bayesian/Frequentist” duality than by dissolving it entirely and changing our terminology to match. This matches my impression of previous forays into the “Bayesian/Frequentist” ‘holy wars’.
If politics is mind-killing, then it must certainly be avoided even at great cost with respect to our most basic tools of rationality.
Indeed, though in that case you’ve spent far more time on this than most who exercised the default ‘ignore’ option.
A good point.
I understood what you meant—I just did not see any inarticulateness on the part of User:potato.
I normally see this being explicitly the subject on Bayesian/Frequentist debates, and many long conversations with philosophers have revolved around whether “equating probability with subjective belief” is an “ontological confusion”.
Duly noted. I’ll try not to give this impression in future.
I may have simply failed to notice these arguments taking place. In order to dissolve any such ostensible ontological question, I’d recommend pointing out that to say probability is one or other thing is merely a statement to the effect that one interpretation is preferred for some reason by the writer—since both interpretations satisfy the Cox postulates or Kolmogorov axioms, we could define probability to be either subjective degrees of belief or long-run frequency, and make sound and rational inferences in either case (albeit perhaps not with the same efficiency). This should be enough to persuade an otherwise sensible person that he’s engaged in a futile argument about definitions.
Formalism attempts to solve the problem by effectively tabooing the concept of probability such that it no longer has a definition. Although we might be able to get around the problem that I mentioned by answering the question “”what is this thing that I am computing using Bayes’s theorem?” by saying “the posterior subjective degree of belief” or “the posterior frequency”, it’s easy to see how the same kind of philosophers would end up arguing over whether, in the case of a coin flip for example, we are really talking about prior and posterior subjective degrees of belief, or about prior and posterior long-run frequencies. And we would have lost the use of the word “probability”, which makes our messages shorter than they would otherwise be.
To the extent that there is such a thing as the proper use of words, to delete useful words from our vocabulary in order to (probably unsuccessfully) prevent people from having a definitional argument that could best be dispelled by introducing them to such notions as “dissolving the question” and reductionism isn’t it. On the other hand I’ll give user:potato credit for exposing an issue that may be more problematic than I at first believed.
I expect that we are substantially in agreement at this point.
FWIW, I think my three preferred terms are “Probabilities”, “Frequencies”, and “Normed Measure Theory”. That’s what Kolmogorov’s formalization amounts to anyway, and as the OP said it truly need not be connected to either probabilities or frequencies in a given use.
I don’t understand. Based on reading through the passages you referenced in PtLoS, maximum entropy is a way of choosing a distribution out of a family of distributions (which, by the way, is a frequentist technique, not a Bayesian one). Solomonoff induction is a choice of prior. I don’t really understand in what sense these are related to each other, or in what sense Maxent generates priors at all.
I’ve always felt that the frequentists that Eliezer argues against are straw men. As I said earlier, I’ve never met a frequentist who is guilty of the accusations that you keep making, although I have met Bayesians whose philosophy interfered with their ability to do good statistical modeling / inference. Have you actually run into the people who you seem to be arguing against? If not, then I think you should restrict yourself to arguing against opinions that people are actually trying to support, although I also think that whether or not some very foolish people happen to be frequentists is irrelevant to the discussion (something Eliezer himself discussed in the “Reversed Stupidity is not Intelligence” post).
If you know nothing about a variable except that it’s in the interval [a, b] your probability distribution must be from the class of distributions where p(x) = 0 for x outside of [a, b]. You pick the distribution of maximal entropy from this class as your prior, to encode ignorance of everything except that
x ∈ [a,b]
.That is one way Maxent may generate a prior, anyway.
We can call dibs on things now? Ooh, I call dibs on approximating a slowly varying function as a constant!
I’m pretty sure almost all of freqeuntist methods are derivable as from bayes, or close approximations of bayes. Do they have any tool which is radically un-bayesian?
See paulfchristiano’s examples elsewhere in this thread.
Another example would be support vector machines, which work really well in practice but aren’t Bayesian (although it’s possible that they are actually Bayesian and I just can’t figure out what prior they correspond to).
There are also neural networks, which are sort of Bayesian but (I think?) not really. I’m not actually that familiar with neural nets (or SVMs for that matter) so I could just be wrong.
ETA: It is the case that every non-dominated decision procedure is either a Bayesian procedure or the limit of Bayesian procedures (which I think could alternately be thought of as a Bayesian procedure with a potentially improper prior). So in that sense, for any frequentist procedure that is not Bayesian, there is another procedure that gets higher expected utility in all possible worlds, and is therefore strictly better. The only problem is that this is again an abstract statement about decision procedures, and doesn’t take into account the computational difficulty of actually finding the better procedure.
This paper is the closest I’ve ever seen to a fully Bayesian interpretation of SVMs; mind you, the authors still use “pseudo-likelihood” to describe the data-dependent part of the optimization criterion.
Neural networks are just a kind of non-linear model. You can perform Bayes upon them if you want.
That is my understanding, too. Frequentists claim not to have priors, but in fact they just use uninformative priors implicitly. In a more fundamental sense, if they genuinely had no priors then they would be unable even to interpret the results of an experiment.
.
There is a dispute, ever hear of the idealists and the realists? Luckily it is over now. But either way. It does not matter why you are using one word to stand for many things, you shouldn’t do it if you can use a terminology that is more widely accepted. I still think that bayesianism is a better interpretation, a much better interpretation than frequentism, but what is it an interpretation of? Is it an interpretation of math? Seems to me like it as interpretation of typographical string manipulations applied to certain basic strings.
That wasn’t another commenter, that was in my article, I’m pretty sure.
If bayesianism wins this argument, which it probably will, it should win because it is the ideal system of statistical inference, not because they managed to convince a bunch of people of a statement with absolutely no empirical consequences. If you argue about what probability is you argue about surface bubbles of your theory that are just irrelevant to the real dispute you are having, whether you are a realist and an idealist, or a frequentist and a bayesian.
I don’t think that’s where I meant to put that comment.
See, these questions are not confusing to me at all. Hofstadter’s formalism deals with them perfectly. Have you ever read G.E.B.? I assumed so, but I wasn’t sure, maybe you haven’t.
Yes, I do think that probability theory is a repeatable process of typographical string manipulations. What do you think it is?
Here I completely disagree, and almost wonder if you haven’t been reading my comments. Bayesianism is stronger, more capable, perfecter, stronger, more rational, more useful than frequentism, first of all, and all of that has nothing to do with the commitment to conceptualism that subjective bayes requires. This is all still true if you are a formalist.
Bayesianism is not righter than frequentism because probabilities are really subjective beliefs, and the frequentists were wrong, it’s not frequency. Bayesains are righter than frequentists because bayes-inferences are deductively demonstrable to win more than frequentist-inferences. Again, the argument about what probability really is is just a way to disguise the argument about who’s statistical method is more successful, the only way the frequentist even has a shot at such an argument if it is disguised as a question about what probability is instead of a question about who’s inferences are theoretically ideal.
So um, platonism? Really? Why? What does it get you that formalism doesn’t with less ontological commitment?
I think you have this backwards. Frequentist techniques typically come with adversarial guarantees (i.e., “as long as the underlying distribution has bounded variance, this method will work”), whereas Bayesian techniques, by choosing a specific prior (such as a Gaussian prior), are making an assumption that will hurt them in an extreme cases or when the data is not drawn from the prior. The tradeoff is that frequentist methods tend to be much more conservative as a result (requiring more data to come to the same conclusion).
If you have a reasonable Bayesian generative model, then using it will probably give you better results with less data. But if you really can’t even build the model (i.e. specify a prior that you trust) then frequentist techniques might actually be appropriate. Note that the distinction I’m drawing is between Bayesian and frequentist techniques, as opposed to Bayesian and frequentist interpretations of probability. In the former case, there are actual reasons to use both. In the latter case, I agree with you that the Bayesian interpretation is obviously correct.
Bayesian methods with uninformative (possibly improper) priors agree with frequentist methods whenever the latter make sense.
Can you explain further? Casually, I consider results like compressed sensing and multiplicative weights to be examples of frequentist approaches (as do people working in these areas), which achieve their results in adversarial settings where no prior is available. I would be interested in seeing how Bayesian methods with improper priors recommend similar behavior.
I admit I’m not familiar with either of those… Can you make a simple example of an “adversarial setting where no prior is available”?
I let you choose some linear functionals, and then tell you the value of each one on some unknown sparse vector (compressed sensing).
We play an iterated game with unknown payoffs; you observe your payoff in each round, but nothing more, and want to maximize total payoff (multiplicative weights).
Put even more simply, what is the Bayesian method that plays randomly in rock-paper-scissors against an unknown adversary? Minimax play seems like a canonical example of a frequentist method; if you have any fixed model of your adversary you might as well play deterministically (at least if you are doing consequentialist loss minimization).
The minimax estimator can be related to Bayesian estimation through the concept of a “least-favorable prior”.
Are you referring to the result that every non-dominated decision procedure is either a Bayesian procedure or a limit of Bayesian procedures? If so, one could imagine a frequentist procedure that is strictly dominated by other procedures, but where finding the dominating procedures is computationally infeasible. Alternately, a procedure could be non-dominated, and thus Bayesian for the right choice of prior, but the correct choice of prior could be difficult to find (the only proof I know of the “non-dominated ⇒ Bayesian” result is non-constructive).
Thanks for the clarification.
What I was trying to emphasise is that, pace “potato”, the frequentist/Bayesian dispute isn’t just an argument about words but actually has ramifications for how one is likely to approach statistical inference—so it shouldn’t be compared to the definitional dispute “If a tree falls in a forest and no one hears it, does it make a sound?”
If someone treated frequentist approaches as though they were equivalent to Bayesian methods in general, then he would occasionally be drastically in error. PT:TLoS offers many examples of this (for example the comparison of a Bayesian “psi test” and the chi-squared test on page 300). My comment about the Gaussian distribution had in mind Jaynes’s discussion of “pre-data and post-data considerations” starting on page 499, in which he discusses the fact that orthodox practice answers the wrong question: it gives correct answers to “if the hypothesis being tested is in fact true, what is the probability that we shall get data indicating that it is true?” when the real problems of scientific inference are concerned with the question “what is the probability conditional on the data that the hypothesis is true?”, and this problem is the result of frequentist philosophy’s failure to admit the existence of prior and posterior probabilities for a fixed parameter or an hypothesis. He suggests that this conflation goes somewhat unnoticed because in the case of the commonly encountered Gaussian sampling distribution the difference is relatively unimportant, but compares another case (Cauchy sampling distributions) in which the Bayesian analysis is far superior.
On the other hand the interlocutors in the standard definitional dispute have no substantive disagreement, i.e. they actually anticipate the same things, so their disagreement amounts to nothing apart from the fact that they waste their time arguing about words.
I’ll defer to your opinion (which is probably much better informed than mine) on whether frequentist methods work well when their limitations are borne in mind.
Why can’t a frequentist say: “Bayesians are conflating probability with subjective degree of belief.” ? They were here first after all.
Probability does model frequency, and it does model subjective degree of believe, and this is not a contradiction. Using the copula is the problem, obviously: if subjective degree of believe is not frequency, and probability is frequency, then probability is not subjective degree of belief. Analogously, if subjective degree of believe is not frequency, and probability is subjective degree of belief, then probability is not frequency.
The problem is that they all conflate “probability” with “subjective degree of belief” and “frequency”, the bayesian conflates subjective degree of belief and probability. The frequentist conflates probability and frequency.
The debate over whether to use Bayesian methods or frequentest methods is of import. I think potato was trying to say this here:
But the question of whether probability is frequency, or if probability is subjective degree of belief, is just as silly as a dispute over whether numbers are quantity, or if they are orders. The answer is that numbers model both, and are neither.
.
No, probability models frequency in the sense that there is an interpretation of komologorov which only mentions terms from the part of our language used to talk about frequency, and all komologorov theorems come out as true statements about frequency under this interpretation.
I mean, literally, Bayes is an arithmetic of odds and fractions, of course it models frequency. At least as well as fractions and odds do. Probability is a frequency as often as it is a fraction or an odds.
They don’t, but could and should.
I agree, this is why instead of saying that probability is identical to frequency, or that it is frequency, we should say that it models frequency.
.
What is this comment supposed to add? Is it an ad hominem, or are you asking for clarification? If you don’t understand that comment perhaps you should try rereading my original post, I have updated it a bit since you first commented, perhaps it is clearer.
(edit) clarification:
The reason that probabilities model frequency is not because our data about some phenomena are dominated by facts of frequency. If you take 10 chips, 6 of them red, 4 of them blue, 5 red ones and 1 blue one on the table, and the rest not on the table, you’ll find that bayes can be used to talk about the frequencies of these predicates in the population. You only need to start with theorems that when interpreted produce the assumptions I just provided, e.g., P(red and on the table) = 1⁄2, P(~red and on the table) = 1⁄10, P(red and ~on the table) = 2⁄5. From those basic statements we can infer using bayes all the following results: P(red|on the table) = 5⁄6, P(~red|on the table) = 1⁄6, P( (red and on the table) or blue) = 9⁄10, P(red) = P(red|on the table) P(on the table) + P(red|~on the table) P(~on the table) = 6⁄10, etc. These are all facts about the FREQUENCY distributions of these chips’ predicates, which can be reached using bayes, and the assumptions above. We can interpret P(red) as the frequency of red chips out of all the chips, and P(red|on the table) as the frequency of red chips out of chips on the table. You’ll find that anything you proof about these frequencies using bayesian inference will be true claims about the frequencies of these predicates within the chips. Hence, bayes models frequency. This is all I meant by bayes models frequency. You’ll also find that it works just as well with volume or area. (I am sorry I wasn’t that concrete to begin with.)
In the same exact way, you can interpret probability theorems as talking about degrees of belief, and if you ask a bayesian, all those interpreted theorems will come out as true statements about rational degree of belief. In this way bayes models rational belief. You can also interpret probability theory as talking about boston’s night life, but not everyone of those interpreted theorems will be true, so probabiliity theory does not model boston’s night life under that interpretation. To model something, means to produce only true statements under a given interpretation about that something.
Frequentists may not treat their tool box as a set of mostly unrelated approximations to perfect learning, or treat bayes as the optimal laws of inference, but they should as far as I can tell. And if they did, they would not cease to be frequentists, they would still use the same methods, use “probability” the same way, and still focus on long run frequency over evidential support. The only difference is that rather than saying probability is frequency and that probability is not subjective degree of belief, they would say that probability models both frequency and subjective degree of belief. Subjective bayesians should make a similar update, though I am sure they don’t swing the copula around as liberally as frequentists. This is what i meant when i said that frequentists could and should believe that frequentism is just a useful approximation, and that bayes is in some sense optimal. I was never really arguing about the practical advantages of bayesianism over frequntism, but about how they both seem to make a similar philosophical mistake in using identity or the copula when the relation of modeling is more applicable. A properly Hofstadterish formalism seems like the best way to deal with all of this comprehensively.
You understand what I was saying now? I really want to know. That you are confused by what seem to me to be my most basic claims, and that you are also as familiar with E. T. Jaynes as your comments suggests is worrying to me. Does this clarification make you less confused?
.
Fine, let’s make up a new frequentism, which is probably already in existence: finite frequentism. Bayes still models finite frequencies, like the example i gave of the chips.
When a normal frequentest would say “as the number of trials goes to infinity” the finite frequentest can say “on average” or “the expectation of”. Rather than saying, as the number of die rolls goes to infinity the fraction of sixes is 1⁄6, we can just say that as the number rises it stabilizes around and gets closer to 1⁄6. That is a fact which is finitely verifiable. If we saw that the more die rolls we added to the average, the closer the fraction of sixes approached 1⁄2, and the closer it hovered around 1⁄2, the frequentest claim would be falsified.
There may be no infinite populations. But the frequentist can still make due with finite frequencies and expected frequencies, and i am not sure what he would loose. There are certainly finite frequencies in the world, and average frequencies are at least empirically testable. What can the frequentist do with infinite populations or trials, that he/she can’t do with expected/average frequencies.
Also, are you a finitist when it comes to calculus? Because the differential calculus requires much more commitment to the idea of a limit, infinity, and the infinitesimal, than frequentists require, if frequentests require these concepts at all. Would you find a finitist interpretation of the calculus to be more philosophically sound than the classical approach?
potato,
I don’t think there’s much value in replying to Phlebas’ latest reply.
.
Except one makes it seem like the stuff does exist and the other makes it seem like it doesn’t. If we interpret the law of large numbers as saying that after an infinite number of trials, the average value of that sequence of results will equal the expected value of the random variable, then any finite amount of evidence is not enough to be evidence of this interpretation, let alone to verify it. But if we interpret the law as saying that the more trials we add, the more closely the average result should hover around the expected value of the variable. That interpretation can be falsified and evidenced empirically using only finite observations.
.
Ok, I am a Bayesian, i.e., I use bayesianism over frequentism, and find frequentest methods rather silly. And I am at least what I would call a finite frequentist, i.e., I think komolgorov models finite frequency.
I am not here to say that Bayesianism is on equal ground with Frequentism, at all. If Bayes’s interpreted sentences can be empirically verified, and freqeuntist interpretations cannot be empirically verified, than this is to bayesianism’s favor. It means it is the more useful theory. But it is not grounds to use “is” where we should use “models” instead. It is not because bayesains need to be put on equal footing with frequentists that I propose this terminology in place of the copula; it is because rationalists should be clear, specially in philosophy. So just to be clear, we should use “model” instead of “is” because it is what is really going on; the concepts of Hofstadterish formalism and model theory, are the best way to understand how probability theory ends up telling us how to distribute beliefs. The relation between subjective degree of belief, and probability theory, is clearly not identity, or the copula.
Subjective degrees of belief are a part of your cognition. Probability theory is a repeatable process consisting in shuffling squiggles on paper, or some other medium. These are not identical. Q.e.d.
We might call this the “paper projection fallacy”. Where you project some pattern in squiggles on a piece of paper into your mind. Analogous to the “mind projection fallacy”.
.
no, probability, or I assume you really mean rationality, is the void
Bayes is just playing with squiggles on paper. If when you interpreted bayes, you found some claim, which seemed to not work, you would have to abandon bayes, or be irrational.
What probability where? If you start by saying degree of belief is probability, and then show that degreeo f belief is probability, I am not impressed. You can call them “the representation” instead of calling them “the theory” if you want. And you can use “is” instead of model if you want. And you can even use “probability” instead of “degree of belief”, though I suspect that may all get rather confusing quickly. But do realize that every reason you give for saying that probability is degree of belief, a frequentest can give for saying that probability is frequency.
“Probability” is a really stupid noun, kind of like “red-hood”, or “emergence”. Notice how in the actual theory, we only ever talk about the probability of something. “Probability” is a function, not an object. Ask yourself: “what IS probability?” really probe, and you’ll find that that is a stupid question. The right question would have been, “what does probability return given an argument?” The answer is that it might return the rational degree of belief of a proposition, the frequency of a predicate out of a finite population, the frequency of a value out of an infinite amount of trials, the volume of a space, the area of a shape, or even the length of a line. All of these are consistent with the komologorov axioms.
Now for this part:
Well it is actually in the bartender’s premise that the coin is biased, so they both know that whatever heads/trials hovers around as trials rises, it is not 1⁄2.
But assuming they didn’t have that premise, what could the frequentist do, without requiring non-empirically verifiable claims as assumptions?
Only thing i can think of: He/she could resort to ranges. Never actually defining the probability of heads, just determining the probability with which the actual probability i.e. frequency of heads, is within a given range. There is some ideal actual frequency, which would be the outcome given infinitely many trials, but you can only find a range within which it is, and it would require infinite amounts of evidence to constrain heads/trials to a point; and we don’t have that kind of time. Bayes can be extended to ranges of probability trivially. THis would make it so that finite observables act as evidence for some hypotheses which include the term “infinity”. But it wouldn’t justify the whole of frequentest methodology.
But again, even if the frequentest interpretation fails in ways which the bayesian interpretation does not, this is not evidence of probability being degree of belief. It is evidence of probability modeling degree of belief, and of Bayesianism having sounder ontological commitments than frequentism. This would not surprise me.
Infinite frequency is not real. But our intuitions about it are real. Komolgorov may then be said to model actual finite frequencies, and our intuitions about infinite frequencies which are finitely and axiomatically describable. Let us not forget that there are not circles or squares anywhere either, but we should still hold that you can’t square the circle. Not all models have to be out there, some may be in here Frequentism requires infinite frequencies for its interpretation to be true, which don’t exist. The subjective bayes interpretation of bayes does not require anything that really doesn’t exist (though degrees of belief are plenty mysterious). This is a good reason to be a subjective Bayesian, and not a frequentest, which I was not aware of consciously, but it is not a good reason to stop being a formalist.
Who cares if frequentists, or non-LW bayesians, use the copula like a bunch of sillies, even after G.E.B. is published. We LWers, should use “identity” if we are claiming identity, and “modeling” if we are claiming a model. But realistically, the claim that “Probability theory models rational belief systems.” seems much more defensible, concrete, and useful, than the claim that “Probability is degree of belief.”