BTW induction is a type of justificationism. Bayesians advocate induction. The end.
I have no confidence the word “induction” means the same thing from one sentence to the next. If you could directly say precisely what is wrong with Bayesian updating, with concrete examples of Bayesian updating in action and why they fail, that might be more persuasive, or at least more useful for diagnosis of the problem wherever it may lie.
For example, you update based on selective observation (all observation is selective).
And you update theories getting attention selectively, ignoring the infinities (by using an arbitrary prior, but not actually using it in the sense of applying it infinitely many times, just estimating what you imagine might happen if you did).
Why don’t you—or someone else—read Popper. You can’t expect to fully understand it from discussion here. Why does this community have no one who knows what they are talking about on this subject? Who is familiar with the concepts instead of just having to ask me?
For example, you update based on selective observation (all observation is selective).
Since all observation is selective, then the selectivity of my observations can hardly be a flaw. Therefore the flaw is that I update. But I just asked you to explain why updating is flawed. Your explanation is circular.
And you update theories getting attention selectively, ignoring the infinities (by using an arbitrary prior, but not actually using it in the sense of applying it infinitely many times, just estimating what you imagine might happen if you did).
I don’t know what that means. I’m not even sure that
you update theories getting attention selectively
is idiomatic English. And
ignoring the infinities
needs to be clarified, I’m sure you must realize. Your parenthetical does not cear it up. You write:
by using an arbitrary prior, but not actually using it
Using it but not actually using it. Can you see why that is hard to interpret? And then:
in the sense of applying it infinitely many times
My flaw is that I don’t do something infinitely many times? Can you see why this begs for clarification? And:
just estimating what you imagine might happen if you did
I’ve lost track: is my core flaw that I am estimating something? Why?
Why don’t you—or someone else—read Popper.
I’ve read Popper. He makes a lot more sense to me than you do. He is not cryptic. He is crystal clear. You are opaque.
You can’t expect to fully understand it from discussion here. Why does this community have no one who knows what they are talking about on this subject? Who is familiar with the concepts instead of just having to ask me?
You seem to be saying that for me to understand you, first I have to understand Popper from his own writing, and that nobody here seems to have done that. I’m not sure why you’re posting here if that’s what you believe.
If you’ve read Popper (well) then you should understand his solution to the problem of induction. How about you tell us what it is, rather than complaining my summary of what you already read needs clarifying.
If you’ve read Popper (well) then you should understand his solution to the problem of induction. How about you tell us what it is, rather than complaining my summary of what you already read needs clarifying.
You are changing the subject. I never asked you to summarize Popper’s solution to the problem of induction. If that’s what you were summarizing just now then you ignored my request. I asked you to critique Bayesian updating with concrete examples.
Now, if Popper wrote a critique of Bayesian updating, I want to read it. So tell me the title. But he must specifically talk about Bayesian updating. Or, if that is not available, anything written by Popper which mentions the word “Bayes” or “Bayesian” at all. Or that discusses Bayes’ equation.
There was nothing in Open Society and its Enemies, Objective Knowledge, Popper Selections, or Conjectures and Refutations, nor in any of the books on Popper that I located. The one exception I found was in The Logic of Scientific Discovery, and the mentions of Bayes are purely technical discussion of the theorem (with which Popper, reasonably enough, has no problem), with no criticism of Bayesians. In summary, I found no critiques by Popper of Bayesians after going through the indices of Popper’s main works as you recommended. I did find mention of Bayes, but it was a mention in which Popper did not criticize Bayes or Bayesians.
Nor were Bayes or Bayesians mentioned anywhere in David Deutsch’s book The Beginning of Infiinity.
So I return to my earlier request:
I asked you to critique Bayesian updating with concrete examples.
I have repeatedly requested this, and in reply been given either condescension, or a fantastically obscure and seemingly self-contradictory response, or a major misinterpretation of my request, or a recommendation that I look to Popper, which recommendation I have followed with no results.
I think you’re problem is you don’t understand what the issues at stake are, so you don’t know what you’re trying to find.
You said:
anything written by Popper which mentions the word “Bayes” or “Bayesian” at all. Or that discusses Bayes’ equation.
But then when you found a well known book by Popper which does have those words, and which does discuss Bayes’ equation, you were not satisfied. You asked for something which wasn’t actually what you wanted. That is not my fault.
You also said:
You are changing the subject. I never asked you to summarize Popper’s solution to the problem of induction.
But you don’t seem to understand that Popper’s solution to the problem of induction is the same topic. You don’t know what you’re looking for. It wasn’t a change of topic. (Hence I thought we should discuss this. But you refused. I’m not sure how you expect to make progress when you refuse to discuss the topic the other guy thinks is crucial to continuing.)
Bayesian updating, as a method of learning in general, is induction. It’s trying to derive knowledge from data. Popper’s criticisms of induction, in general, apply. And his solution solves the underlying problem rendering Bayesian updating unnecessary even if it wasn’t wrong. (Of course, as usual, it’s right when applied narrowly to certain mathematical problems. It’s wrong when extended out of that context to be used for other purposes, e.g. to try to solve the problem of induction.)
So, question: what do you think you’re looking for? There is tons of stuff about probability in various Popper books including chapter 8 of LScD titled “probability”. There is tons of explanation about the problem of induction, and why support doesn’t work, in various Popper books. Bayesian updating is a method of positively supporting theories; Popper criticized all such methods and his criticisms apply. In what way is that not what you wanted? What do you want?
So for example I opened to a random page in that chapter and found, p 183, start of section 66, the first sentence is:
Probability estimates are not falsifiable.
This is a criticism of the Bayesian approach as unscientific. It’s not specifically about the Bayesian approach in that it applies to various non-Bayesian probabilistic approaches (whatever those may be. can you think of any other approaches besides Bayesian epistemology that you think this is targeted at? How would you do it without Bayes’ theorem?). In any case it is a criticism and it applies straightforwardly to Bayesian epistemology. It’s not the only criticism.
The point of this criticism is that to even begin the Bayesian updating process you need probability estimates which are created unscientifically by making them up (no, making up a “prior” which assigns all of them at once, in a way vague enough that you can’t even use it in real life without “estimating” arbitrarily, doesn’t mean you haven’t just made them up).
EDIT: read the first 2 footnotes in section 81 of LScD, plus section 81 itself. And note that the indexer did not miss this but included it...
Bayesian updating, as a method of learning in general, is induction. It’s trying to derive knowledge from data.
Only in a sense so broad that Popper can rightly be accused of the very same thing. Bayesians use experience to decide between competing hypotheses. That is the sort of “derive” that Bayesians do. But if that is “deriving”, then Popper “derives”. David Deutsch, who you know, says the following:
But, in reality, scientific theories are not ‘derived’ from anything. We do not read them in nature, nor does nature write them into us. They are guesses – bold conjectures. Human minds create them by rearranging, combining, altering and adding to existing ideas with the intention of improving upon them. We do not begin with ‘white paper’ at birth, but with inborn expectations and intentions and an innate ability to improve upon them using thought and experience. Experience is indeed essential to science, but its role is different from that supposed by empiricism. It is not the source from which theories are derived. Its main use is to choose between theories that have already been guessed. That is what ‘learning from experience’ is.
I direct you specifically to this sentence:
Experience is indeed essential to science, but its role is different from that supposed by empiricism. It is not the source from which theories are derived. Its main use is to choose between theories that have already been guessed.
This is what Bayesians do. Experience is what Bayesians use to choose between theories which have already been guessed. They do this using Bayes’ Theorem. But look back at the first sentence of the passage:
But, in reality, scientific theories are not ‘derived’ from anything.
Clearly, then, Deutsch does not consider using the data to choose between theories to be “deriving”. But Bayesians use the data to choose between theories. Therefore, as Deutsch himself defines it, Bayesians are not “deriving”.
The point of this criticism is that to even begin the Bayesian updating process you need probability estimates which are created unscientifically by making them up
Yes, the Bayesians make them up, but notice that Bayesians therefore are not trying to derive them from data—which was your initial criticism above. Moreover, this is not importantly different from a Popperian scientist making up conjectures to test. The Popperian scientist comes up with some conjectures, and then, as Deutsch says, he uses experimental data to “choose between theories that have already been guessed”. How exactly does he do that? Typical data does not decisively falsify a hypothesis. There is, just for starters, the possibility of experimental error. So how does one really employ data to choose between competing hypotheses? Bayesians have an answer: they choose on the basis of how well the data fits each hypothesis, which they interpret to mean how probable the data is given the hypothesis. Whether he admits it or not, the Popperian scientist can’t help but do something fundamentally the same. He has no choice but to deal with probabilities, because probabilities are all he has.
The Popperian scientist, then, chooses between theories that he has guessed on the basis of the data. Since the data, being uncertain, does not decisively refute either theory but is merely more, or less, probable given the theory, then the Popperian scientist has no choice but to deal with probabilities. If the Popperian scientist chooses the theory that the data fits best, then he is in effect acting as a Bayesian who has assigned to his competing theories the same prior.
Do you understand DD’s point that the majority of the time theories are rejected without testing which is in both his books? Testing is only useful when dealing with good explanations.
Do you understand that data alone cannot choose between the infinitely many theories consistent with it, which reach a wide variety of contradictory and opposite conclusions? So Bayesian Updating based on data does not solve the problem of choosing between theories. What does?
Do you understand DD’s point that the majority of the time theories are rejected without testing which is in both his books? Testing is only useful when dealing with good explanations.
Bayesians are also seriously concerned with the fact that an infinity of theories are consistent with the evidence. DD evidently doesn’t think so, given his comments on Occam’s Razor, which he appears to be familiar with only in an old, crude version, but I think that there is a lot in common between his “good explanation” criterion and parsimony considerations.
We aren’t “seriously concerned” because we have solved the problem, and it’s not particularly relevant to our approach.
We just bring it up as a criticism of epistemologies that fail to solve the problem… Because they have failed, they should be rejected.
You haven’t provided details about your fixed Occam’s razor, a specific criticism of any specific thing DD said, a solution to the problem of induction (all epistemologies need one of some sort), or a solution to the infinity of theories problem.
Probability estimates are essentially the bookkeeping which Bayesians use to keep track of which things they’ve falsified, and which things they’ve partially falsified. At the time Popper wrote that, scientists had not yet figured out the rules for using probability correctly; the stuff he was criticizing really was wrong, but it wasn’t the same stuff people use today.
At the time Popper wrote that, scientists had not yet figured out the rules for using probability correctly; the stuff he was criticizing really was wrong, but it wasn’t the same stuff people use today.
Is this true? Popper wrote LScD in 1934. Keynes and Ramsey wrote about using probability to handle uncertainty in the 1920s although I don’t think anyone paid attention to that work for a few years. I don’t know enough about their work in detail to comment on whether or not Popper is taking it into account although I certainly get the impression that he’s influenced by Keynes.
According to the wikipedia page, Cox’s theorem first appeared in R. T. Cox, “Probability, Frequency, and Reasonable Expectation,” Am. Jour. Phys., 14, 1–13, (1946). Prior to that, I don’t think probability had much in the way of philosophical foundations, although they may’ve gotten the technical side right. And correct use of probability for more complex things, like causal models, didn’t come until much later. (And Popper was dealing with the case of science-in-general, which requires those sorts of advanced tools.)
The English version of LScD came out in 1959. It wasn’t a straight translation; Popper worked on it. In my (somewhat vague) understanding he changed some stuff or at least added some footnotes (and appendices?).
Anyway Popper published plenty of stuff after 1946 including material from the LScD postscript that got split into several books, and also various books where he had the chance to say whatever he wanted. If he thought there was anything important to update he would have. And for example probability gets a lot of discussion in Popper’s replies to his critics, and Bayes’ theorem in particular comes up some; that’s from 1974.
So for example on page 1185 of the Schilpp volume 2, Popper says he never doubted Bayes’ theorem but that “it is not generally applicable to hypotheses which form an infinite set”.
How can something be partially falsified? It’s either consistent with the evidence or contradicted. This is a dichotomy. To allow partial falsification you have to judge in some other way which has a larger number of outcomes. What way?
Probability estimates are essentially the bookkeeping which Bayesians use to keep track of which things they’ve falsified, and which things they’ve partially falsified.
You’re saying you started without them, and come up with some in the middle. But how does that work? How do you get started without having any?
the stuff he was criticizing really was wrong, but it wasn’t the same stuff people use today.
Changing the math cannot answer any of his non-mathematical criticisms. So his challenge remains.
(It’s subject to limitations that do not constrain the Bayesian approach, and as near as I can tell, is mathematically equivalent to a non-informative Bayesian approach when it is applicable, but the author’s justification for his procedure is wholly non-Bayesian.)
I think you mixed up Bayes’ Theorem and Bayesian Epistemology. The abstract begins:
By representing the range of fair betting odds according to a pair of confidence set estimators, dual probability measures on parameter space called frequentist posteriors secure the coherence of subjective inference without any prior distribution.
They have a problem with a prior distribution, and wish to do without it. That’s what I think the paper is about. The abstract does not say “we don’t like bayes’ theorem and figured out a way to avoid it.” Did you have something else in mind? What?
I had in mind a way of putting probability distributions on unknown constants that avoids prior distributions and Bayes’ theorem. I though that this would answer the question you posed when you wrote:
It’s not specifically about the Bayesian approach in that it applies to various non-Bayesian probabilistic approaches (whatever those may be. can you think of any other approaches besides Bayesian epistemology that you think this is targeted at?)
I have no confidence the word “induction” means the same thing from one sentence to the next. If you could directly say precisely what is wrong with Bayesian updating, with concrete examples of Bayesian updating in action and why they fail, that might be more persuasive, or at least more useful for diagnosis of the problem wherever it may lie.
For example, you update based on selective observation (all observation is selective).
And you update theories getting attention selectively, ignoring the infinities (by using an arbitrary prior, but not actually using it in the sense of applying it infinitely many times, just estimating what you imagine might happen if you did).
Why don’t you—or someone else—read Popper. You can’t expect to fully understand it from discussion here. Why does this community have no one who knows what they are talking about on this subject? Who is familiar with the concepts instead of just having to ask me?
Since all observation is selective, then the selectivity of my observations can hardly be a flaw. Therefore the flaw is that I update. But I just asked you to explain why updating is flawed. Your explanation is circular.
I don’t know what that means. I’m not even sure that
is idiomatic English. And
needs to be clarified, I’m sure you must realize. Your parenthetical does not cear it up. You write:
Using it but not actually using it. Can you see why that is hard to interpret? And then:
My flaw is that I don’t do something infinitely many times? Can you see why this begs for clarification? And:
I’ve lost track: is my core flaw that I am estimating something? Why?
I’ve read Popper. He makes a lot more sense to me than you do. He is not cryptic. He is crystal clear. You are opaque.
You seem to be saying that for me to understand you, first I have to understand Popper from his own writing, and that nobody here seems to have done that. I’m not sure why you’re posting here if that’s what you believe.
If you’ve read Popper (well) then you should understand his solution to the problem of induction. How about you tell us what it is, rather than complaining my summary of what you already read needs clarifying.
You are changing the subject. I never asked you to summarize Popper’s solution to the problem of induction. If that’s what you were summarizing just now then you ignored my request. I asked you to critique Bayesian updating with concrete examples.
Now, if Popper wrote a critique of Bayesian updating, I want to read it. So tell me the title. But he must specifically talk about Bayesian updating. Or, if that is not available, anything written by Popper which mentions the word “Bayes” or “Bayesian” at all. Or that discusses Bayes’ equation.
If you check out the indexes of popper’s books it’s easy to find the word bayes. try it some time...
There was nothing in Open Society and its Enemies, Objective Knowledge, Popper Selections, or Conjectures and Refutations, nor in any of the books on Popper that I located. The one exception I found was in The Logic of Scientific Discovery, and the mentions of Bayes are purely technical discussion of the theorem (with which Popper, reasonably enough, has no problem), with no criticism of Bayesians. In summary, I found no critiques by Popper of Bayesians after going through the indices of Popper’s main works as you recommended. I did find mention of Bayes, but it was a mention in which Popper did not criticize Bayes or Bayesians.
Nor were Bayes or Bayesians mentioned anywhere in David Deutsch’s book The Beginning of Infiinity.
So I return to my earlier request:
I have repeatedly requested this, and in reply been given either condescension, or a fantastically obscure and seemingly self-contradictory response, or a major misinterpretation of my request, or a recommendation that I look to Popper, which recommendation I have followed with no results.
I think you’re problem is you don’t understand what the issues at stake are, so you don’t know what you’re trying to find.
You said:
But then when you found a well known book by Popper which does have those words, and which does discuss Bayes’ equation, you were not satisfied. You asked for something which wasn’t actually what you wanted. That is not my fault.
You also said:
But you don’t seem to understand that Popper’s solution to the problem of induction is the same topic. You don’t know what you’re looking for. It wasn’t a change of topic. (Hence I thought we should discuss this. But you refused. I’m not sure how you expect to make progress when you refuse to discuss the topic the other guy thinks is crucial to continuing.)
Bayesian updating, as a method of learning in general, is induction. It’s trying to derive knowledge from data. Popper’s criticisms of induction, in general, apply. And his solution solves the underlying problem rendering Bayesian updating unnecessary even if it wasn’t wrong. (Of course, as usual, it’s right when applied narrowly to certain mathematical problems. It’s wrong when extended out of that context to be used for other purposes, e.g. to try to solve the problem of induction.)
So, question: what do you think you’re looking for? There is tons of stuff about probability in various Popper books including chapter 8 of LScD titled “probability”. There is tons of explanation about the problem of induction, and why support doesn’t work, in various Popper books. Bayesian updating is a method of positively supporting theories; Popper criticized all such methods and his criticisms apply. In what way is that not what you wanted? What do you want?
So for example I opened to a random page in that chapter and found, p 183, start of section 66, the first sentence is:
This is a criticism of the Bayesian approach as unscientific. It’s not specifically about the Bayesian approach in that it applies to various non-Bayesian probabilistic approaches (whatever those may be. can you think of any other approaches besides Bayesian epistemology that you think this is targeted at? How would you do it without Bayes’ theorem?). In any case it is a criticism and it applies straightforwardly to Bayesian epistemology. It’s not the only criticism.
The point of this criticism is that to even begin the Bayesian updating process you need probability estimates which are created unscientifically by making them up (no, making up a “prior” which assigns all of them at once, in a way vague enough that you can’t even use it in real life without “estimating” arbitrarily, doesn’t mean you haven’t just made them up).
EDIT: read the first 2 footnotes in section 81 of LScD, plus section 81 itself. And note that the indexer did not miss this but included it...
Only in a sense so broad that Popper can rightly be accused of the very same thing. Bayesians use experience to decide between competing hypotheses. That is the sort of “derive” that Bayesians do. But if that is “deriving”, then Popper “derives”. David Deutsch, who you know, says the following:
I direct you specifically to this sentence:
This is what Bayesians do. Experience is what Bayesians use to choose between theories which have already been guessed. They do this using Bayes’ Theorem. But look back at the first sentence of the passage:
Clearly, then, Deutsch does not consider using the data to choose between theories to be “deriving”. But Bayesians use the data to choose between theories. Therefore, as Deutsch himself defines it, Bayesians are not “deriving”.
Yes, the Bayesians make them up, but notice that Bayesians therefore are not trying to derive them from data—which was your initial criticism above. Moreover, this is not importantly different from a Popperian scientist making up conjectures to test. The Popperian scientist comes up with some conjectures, and then, as Deutsch says, he uses experimental data to “choose between theories that have already been guessed”. How exactly does he do that? Typical data does not decisively falsify a hypothesis. There is, just for starters, the possibility of experimental error. So how does one really employ data to choose between competing hypotheses? Bayesians have an answer: they choose on the basis of how well the data fits each hypothesis, which they interpret to mean how probable the data is given the hypothesis. Whether he admits it or not, the Popperian scientist can’t help but do something fundamentally the same. He has no choice but to deal with probabilities, because probabilities are all he has.
The Popperian scientist, then, chooses between theories that he has guessed on the basis of the data. Since the data, being uncertain, does not decisively refute either theory but is merely more, or less, probable given the theory, then the Popperian scientist has no choice but to deal with probabilities. If the Popperian scientist chooses the theory that the data fits best, then he is in effect acting as a Bayesian who has assigned to his competing theories the same prior.
Where do you get the theories you consider?
Do you understand DD’s point that the majority of the time theories are rejected without testing which is in both his books? Testing is only useful when dealing with good explanations.
Do you understand that data alone cannot choose between the infinitely many theories consistent with it, which reach a wide variety of contradictory and opposite conclusions? So Bayesian Updating based on data does not solve the problem of choosing between theories. What does?
Bayesians are also seriously concerned with the fact that an infinity of theories are consistent with the evidence. DD evidently doesn’t think so, given his comments on Occam’s Razor, which he appears to be familiar with only in an old, crude version, but I think that there is a lot in common between his “good explanation” criterion and parsimony considerations.
We aren’t “seriously concerned” because we have solved the problem, and it’s not particularly relevant to our approach.
We just bring it up as a criticism of epistemologies that fail to solve the problem… Because they have failed, they should be rejected.
You haven’t provided details about your fixed Occam’s razor, a specific criticism of any specific thing DD said, a solution to the problem of induction (all epistemologies need one of some sort), or a solution to the infinity of theories problem.
Probability estimates are essentially the bookkeeping which Bayesians use to keep track of which things they’ve falsified, and which things they’ve partially falsified. At the time Popper wrote that, scientists had not yet figured out the rules for using probability correctly; the stuff he was criticizing really was wrong, but it wasn’t the same stuff people use today.
Is this true? Popper wrote LScD in 1934. Keynes and Ramsey wrote about using probability to handle uncertainty in the 1920s although I don’t think anyone paid attention to that work for a few years. I don’t know enough about their work in detail to comment on whether or not Popper is taking it into account although I certainly get the impression that he’s influenced by Keynes.
According to the wikipedia page, Cox’s theorem first appeared in R. T. Cox, “Probability, Frequency, and Reasonable Expectation,” Am. Jour. Phys., 14, 1–13, (1946). Prior to that, I don’t think probability had much in the way of philosophical foundations, although they may’ve gotten the technical side right. And correct use of probability for more complex things, like causal models, didn’t come until much later. (And Popper was dealing with the case of science-in-general, which requires those sorts of advanced tools.)
The English version of LScD came out in 1959. It wasn’t a straight translation; Popper worked on it. In my (somewhat vague) understanding he changed some stuff or at least added some footnotes (and appendices?).
Anyway Popper published plenty of stuff after 1946 including material from the LScD postscript that got split into several books, and also various books where he had the chance to say whatever he wanted. If he thought there was anything important to update he would have. And for example probability gets a lot of discussion in Popper’s replies to his critics, and Bayes’ theorem in particular comes up some; that’s from 1974.
So for example on page 1185 of the Schilpp volume 2, Popper says he never doubted Bayes’ theorem but that “it is not generally applicable to hypotheses which form an infinite set”.
How can something be partially falsified? It’s either consistent with the evidence or contradicted. This is a dichotomy. To allow partial falsification you have to judge in some other way which has a larger number of outcomes. What way?
You’re saying you started without them, and come up with some in the middle. But how does that work? How do you get started without having any?
Changing the math cannot answer any of his non-mathematical criticisms. So his challenge remains.
Here’s one way.
(It’s subject to limitations that do not constrain the Bayesian approach, and as near as I can tell, is mathematically equivalent to a non-informative Bayesian approach when it is applicable, but the author’s justification for his procedure is wholly non-Bayesian.)
I think you mixed up Bayes’ Theorem and Bayesian Epistemology. The abstract begins:
They have a problem with a prior distribution, and wish to do without it. That’s what I think the paper is about. The abstract does not say “we don’t like bayes’ theorem and figured out a way to avoid it.” Did you have something else in mind? What?
I had in mind a way of putting probability distributions on unknown constants that avoids prior distributions and Bayes’ theorem. I though that this would answer the question you posed when you wrote: