So what if p(H) = 1, p(H|A) = .4, p(H|B) = .3, and p(H|C) = .3? The evidence would suggest all are wrong. But I have also determined that A, B, and C are the only possible explanations for H. Clearly there is something wrong with my measurement, but I have no method of correcting for this problem.
H is Hypothesis. You have three: HA, HB, and HC. Let’s say your prior is that they are equally probable, so the unconditional P(HA) = P(HB) = P(HC) = 0.33
Let’s also say you saw some evidence E and your posteriors are P(HA|E) = 0.4, P(HB|E) = 0.3, P(HC|E) = 0.3. This means that evidence E confirms HA because P(HA|E) > P(HA). This does not mean that you are required to believe that HA is true or bet your life’s savings on it.
That’s a really good explanation of part of the problem I was getting at. But that requires considering the three hypotheses as a group rather than in isolation from all other hypotheses to calculate 0.33.
But that requires considering the three hypotheses as a group rather than in isolation from all other hypotheses to calculate 0.33
No, it does not.
Let’s say you have a hypothesis HZ. You have a prior for it, say P(HZ) = 0.2 which means that you think that there is a 20% probability that HZ is true and 80% probability that something else is true. Then you see evidence E and it so happens that the posterior for HZ becomes 0.25, so P(HZ|E) = 0.25. This means that evidence E confirmed hypothesis HZ and that statement requires nothing from whatever other hypotheses HA,B,C,D,E,etc. might there be.
How would you calculate that prior of 0.2? In my original example, my prior was 1, and then you transformed it into 0.33 by dividing by the number of possible hypotheses. You wouldn’t be able to do that without taking the other two possibilities into account. As I said, the issue can be corrected for if the number of hypotheses is known, but not if the number of possibilities is unknown. However, frequently philosophical theories of bayesian confirmation theory don’t consider this problem. From this paper by Morey, Romeijn, and Rouder:
Overconfident Bayes is problematic because it lacks the necessary humility that accompanies the understanding that inferences are based on representations. We agree that there is a certain silliness in computing a posterior odds between model A and model B, seeing that it is in favour of model A by 1 million to one, and then declaring that model A has a 99.9999% probability of being true. But this silliness arises not from model A being false. It arises from the fact that the representation of possibilities is quite likely impoverished because there are only two models. This impoverished representation makes translating the representational statistical inferences into inferences pertaining to the real world difficult or impossible.
Priors are always for a specific hypothesis. If your prior is 1, this means you believe this hypothesis unconditionally and no evidence can make you stop believing it.
You are talking about the requirement that all mutually exclusive probabilities must sum to 1. That’s just a property of probabilities and has nothing to do with Bayes.
the issue can be corrected for if the number of hypotheses is known, but not if the number of possibilities is unknown.
Yes, it can. To your “known” hypotheses you just add one more which is “something else”.
Really, just go read. You are confused because you misunderstand the basics. Stop with the philosophy and just figure out how the math works.
I’m not arguing with the math; I’m arguing with how the philosophy is often applied. Consider the condition where my prior is greater than my evidence for all choices I’ve looked at, the number of possibilities is unknown, but I still need to make a decision about the problem? As the paper I was originally referencing mentioned, what if all options are false?
What does “have to make a decision” mean when “all options are false”?
Are you thinking about the situation when you have, say, 10 alternatives with the probabilities of 10% each except for two, one at 11% and one at 9%? None of them are “true” or “false”, you don’t know that. What you probably mean is that even the best option, the 11% alternative, is more likely to be false than true. Yes, but so what? If you have to pick one, you pick the RELATIVE best and if its probability doesn’t cross the 50% threshold, well, them’s the breaks.
Yes that is exactly what I’m getting at. It doesn’t seem reasonable to say you’ve confirmed the 11% alternative. But then there’s another problem, what if you have to make this decision multiple times? Do you throw out the other alternatives and only focus on the 11%? That would lead to status quo bias. So you have to keep the other alternatives in mind, but what do you do with them? Would you then say you’ve confirmed those other alternatives? This is where the necessity of something like falsification comes into play. You’ve got to continue analyzing multiple options as new evidence comes in, but trying to analyze all the alternatives is too difficult, so you need a way to throw out certain alternatives, but you never actually confirm any of them. These problems come up all the time in day to day decision making such as deciding on what’s for dinner tonight.
It doesn’t seem reasonable to say you’ve confirmed the 11% alternative.
In the context of the Bayesian confirmation theory, it’s not you who “confirms” the hypothesis. It’s evidence which confirms some hypothesis and that happens at the prior → posterior stage. Once you’re dealing with posteriors, all the confirmation has already been done.
what if you have to make this decision multiple times?
Do you get any evidence to update your posteriors? Is there any benefit to picking different alternatives? If no and no, then sure, you repeat your decision.
That would lead to status quo bias.
No, it would not. That’s not what the status quo bias is.
You keep on using words without understanding their meaning. This is a really bad habit.
If your problem is which tests to run, then you’re in the experimental design world. Crudely speaking, you want to rank your available tests by how much information they will give you and then do those which have high expected information and discard those which have low expected information.
All you have to do is not simultaneously use “confirm” to mean both “increase the probability of” and “assign high probability to”.
As for throwing out unlikely possibilities to save on computation: that (or some other shortcut) is sometimes necessary but it’s an entirely separate matter from Bayesian confirmation theory or indeed Popperian falsificationism. (Popper just says to rule things out when you’ve disproved them. In your example, you have a bunch of things near to 10% and Popper gives you no licence to throw any of them out.
Yes, sorry. I’m considering multiple sources which I recognize the rest of you haven’t read, and trying to translate them into short comments which I’m probably not the best person to do so, so I recognize the problem I’m talking about may come out a bit garbled, but I think the quote from the Morey et al. paper I quoted above describes the problem the best.
You see how Morey et al call the position they’re criticizing “Overconfident Bayesianism”? That’s because they’re contrasting it with another way of doing Bayesianism, about which they say “we suspect that most Bayesians adhere to a similar philosophy”. They explicitly say that what they’re advocating is a variety of Bayesian confirmation theory.
The part about deduction from the Morey et al. paper:
GS describe model testing as being outside the scope of Bayesian confirmation theory, and we agree. This should not be seen as a failure of Bayesian confirmation theory, but rather as an admission that Bayesian confirmation theory cannot describe all aspects of the data analysis cycle. It would be widely agreed that the initial generation of models is outside Bayesian confirmation theory; it should then be no surprise that subsequent generation of models is also outside its scope.
Who has been claiming that Bayesian confirmation theory is a tool for generating models?
(It can kinda-sorta be used that way if you have a separate process that generates all possible models, hence the popularity of Solomonoff induction around here. But that’s computationally intractable.)
As stated in my original comment, confirmation is only half the problem to be considered. The other half is inductive inference which is what many people mean when they refer to Bayesian inference. I’m not saying one way is clearly right and the other wrong, but that this is a difficult problem to which the standard solution may not be best.
You’d have to read the Andrew Gelman paper they’re responding to to see a criticism of confirmation.
As I said, the issue can be corrected for if the number of hypotheses is known, but not if the number of possibilities is
unknown
You don’t need to know the number, you need to know the model (which could have infinite hypotheses in it).
Your model (hypothesis set) could be specified by an infinite number of parameters, say “all possible means and variances of a Gaussian.” You can have a prior on this space, which is a density. You update the density with evidence to get a new density. This is Bayesian stats 101. Why not just go read about it? Bishop’s machine learning book is good.
True, but working from a model is not an inductive method, so it can’t be classified as confirmation through inductive inference which is what I’m criticizing.
??? IlyaShpitser if I understand correctly is talking about creating a model of a prior, collecting evidence, and then determining whether the model is true or false. That’s hypothesis testing, which is deduction; not induction.
You have a (possibly infinite) set of hypotheses. You maintain beliefs about this set. As you get more data, your beliefs change. To maintain beliefs you need a distribution/density. To do that you need a model (a model is just a set of densities you consider). You may have a flexible model and let the data decide how flexible you want to be (non-parametric Bayes stuff, I don’t know too much about it), but there’s still a model.
Suggesting for the third and final time to get off the internet argument train and go read a book about Bayesian inference.
That interesting solution is exactly what people doing Bayesian inference do. Any criticism you may have that doesn’t apply to what Ilya describes isn’t a criticism of Bayesian inference.
But that requires considering the three hypotheses as a group rather than in isolation from all other hypotheses to calculate 0.33.
Not really. A hypothesis’s prior probability comes from the total of all of your knowledge; in order to determine that P(HA)=0.33 Lumifer needed the additional facts that there were three possibilities that were all equally likely.
It works just as well if I say that my prior is P(HA)=0.5, without any exhaustive enumeration of the other possibilities. Then evidence E confirms HA if P(HA|E)>P(HA).
(One should be suspicious that my prior probability assessment is a good one if I haven’t accounted for all the probability mass, but the mechanisms still work.)
One should be suspicious that my prior probability assessment is a good one if I haven’t accounted for all the probability mass, but the mechanisms still work.
Which is one of the other problems I was getting at
If you start with inconsistent assumptions, you get inconsistent conclusions. If you believe P(H)=1, P(A&B&C)=1, and P(H|A) etc. are all <1, then you have already made a mistake. Why are you blaming this on Bayesian confirmation theory?
The relevance is that it’s a really weird way to set up a problem. If P(H)=1 and P(H|A)=0.4 then it is necessarily the case that P(A)=0. If that’s not immediately obvious to you, you may want to come back to this topic after sleeping on it.
So what if p(H) = 1, p(H|A) = .4, p(H|B) = .3, and p(H|C) = .3? The evidence would suggest all are wrong. But I have also determined that A, B, and C are the only possible explanations for H. Clearly there is something wrong with my measurement, but I have no method of correcting for this problem.
H is Hypothesis. You have three: HA, HB, and HC. Let’s say your prior is that they are equally probable, so the unconditional P(HA) = P(HB) = P(HC) = 0.33
Let’s also say you saw some evidence E and your posteriors are P(HA|E) = 0.4, P(HB|E) = 0.3, P(HC|E) = 0.3. This means that evidence E confirms HA because P(HA|E) > P(HA). This does not mean that you are required to believe that HA is true or bet your life’s savings on it.
That’s a really good explanation of part of the problem I was getting at. But that requires considering the three hypotheses as a group rather than in isolation from all other hypotheses to calculate 0.33.
No, it does not.
Let’s say you have a hypothesis HZ. You have a prior for it, say P(HZ) = 0.2 which means that you think that there is a 20% probability that HZ is true and 80% probability that something else is true. Then you see evidence E and it so happens that the posterior for HZ becomes 0.25, so P(HZ|E) = 0.25. This means that evidence E confirmed hypothesis HZ and that statement requires nothing from whatever other hypotheses HA,B,C,D,E,etc. might there be.
How would you calculate that prior of 0.2? In my original example, my prior was 1, and then you transformed it into 0.33 by dividing by the number of possible hypotheses. You wouldn’t be able to do that without taking the other two possibilities into account. As I said, the issue can be corrected for if the number of hypotheses is known, but not if the number of possibilities is unknown. However, frequently philosophical theories of bayesian confirmation theory don’t consider this problem. From this paper by Morey, Romeijn, and Rouder:
You need to read up on basic Bayesianism.
Priors are always for a specific hypothesis. If your prior is 1, this means you believe this hypothesis unconditionally and no evidence can make you stop believing it.
You are talking about the requirement that all mutually exclusive probabilities must sum to 1. That’s just a property of probabilities and has nothing to do with Bayes.
Yes, it can. To your “known” hypotheses you just add one more which is “something else”.
Really, just go read. You are confused because you misunderstand the basics. Stop with the philosophy and just figure out how the math works.
I’m not arguing with the math; I’m arguing with how the philosophy is often applied. Consider the condition where my prior is greater than my evidence for all choices I’ve looked at, the number of possibilities is unknown, but I still need to make a decision about the problem? As the paper I was originally referencing mentioned, what if all options are false?
You are not arguing, you’re just being incoherent. For example,
...that sentence does not make any sense.
Then the option “something else” is true.
But you can’t pick something else; you have to make a decision
What does “have to make a decision” mean when “all options are false”?
Are you thinking about the situation when you have, say, 10 alternatives with the probabilities of 10% each except for two, one at 11% and one at 9%? None of them are “true” or “false”, you don’t know that. What you probably mean is that even the best option, the 11% alternative, is more likely to be false than true. Yes, but so what? If you have to pick one, you pick the RELATIVE best and if its probability doesn’t cross the 50% threshold, well, them’s the breaks.
Yes that is exactly what I’m getting at. It doesn’t seem reasonable to say you’ve confirmed the 11% alternative. But then there’s another problem, what if you have to make this decision multiple times? Do you throw out the other alternatives and only focus on the 11%? That would lead to status quo bias. So you have to keep the other alternatives in mind, but what do you do with them? Would you then say you’ve confirmed those other alternatives? This is where the necessity of something like falsification comes into play. You’ve got to continue analyzing multiple options as new evidence comes in, but trying to analyze all the alternatives is too difficult, so you need a way to throw out certain alternatives, but you never actually confirm any of them. These problems come up all the time in day to day decision making such as deciding on what’s for dinner tonight.
In the context of the Bayesian confirmation theory, it’s not you who “confirms” the hypothesis. It’s evidence which confirms some hypothesis and that happens at the prior → posterior stage. Once you’re dealing with posteriors, all the confirmation has already been done.
Do you get any evidence to update your posteriors? Is there any benefit to picking different alternatives? If no and no, then sure, you repeat your decision.
No, it would not. That’s not what the status quo bias is.
You keep on using words without understanding their meaning. This is a really bad habit.
When I say throw out I’m talking about halting tests, not changing the decision.
If your problem is which tests to run, then you’re in the experimental design world. Crudely speaking, you want to rank your available tests by how much information they will give you and then do those which have high expected information and discard those which have low expected information.
True.
All you have to do is not simultaneously use “confirm” to mean both “increase the probability of” and “assign high probability to”.
As for throwing out unlikely possibilities to save on computation: that (or some other shortcut) is sometimes necessary but it’s an entirely separate matter from Bayesian confirmation theory or indeed Popperian falsificationism. (Popper just says to rule things out when you’ve disproved them. In your example, you have a bunch of things near to 10% and Popper gives you no licence to throw any of them out.
Yes, sorry. I’m considering multiple sources which I recognize the rest of you haven’t read, and trying to translate them into short comments which I’m probably not the best person to do so, so I recognize the problem I’m talking about may come out a bit garbled, but I think the quote from the Morey et al. paper I quoted above describes the problem the best.
You see how Morey et al call the position they’re criticizing “Overconfident Bayesianism”? That’s because they’re contrasting it with another way of doing Bayesianism, about which they say “we suspect that most Bayesians adhere to a similar philosophy”. They explicitly say that what they’re advocating is a variety of Bayesian confirmation theory.
The part about deduction from the Morey et al. paper:
Who has been claiming that Bayesian confirmation theory is a tool for generating models?
(It can kinda-sorta be used that way if you have a separate process that generates all possible models, hence the popularity of Solomonoff induction around here. But that’s computationally intractable.)
As stated in my original comment, confirmation is only half the problem to be considered. The other half is inductive inference which is what many people mean when they refer to Bayesian inference. I’m not saying one way is clearly right and the other wrong, but that this is a difficult problem to which the standard solution may not be best.
You’d have to read the Andrew Gelman paper they’re responding to to see a criticism of confirmation.
You don’t need to know the number, you need to know the model (which could have infinite hypotheses in it).
Your model (hypothesis set) could be specified by an infinite number of parameters, say “all possible means and variances of a Gaussian.” You can have a prior on this space, which is a density. You update the density with evidence to get a new density. This is Bayesian stats 101. Why not just go read about it? Bishop’s machine learning book is good.
True, but working from a model is not an inductive method, so it can’t be classified as confirmation through inductive inference which is what I’m criticizing.
You are severely confused about the basics. Please unconfuse yourself before getting to the criticism stage.
??? IlyaShpitser if I understand correctly is talking about creating a model of a prior, collecting evidence, and then determining whether the model is true or false. That’s hypothesis testing, which is deduction; not induction.
You don’t understand.
You have a (possibly infinite) set of hypotheses. You maintain beliefs about this set. As you get more data, your beliefs change. To maintain beliefs you need a distribution/density. To do that you need a model (a model is just a set of densities you consider). You may have a flexible model and let the data decide how flexible you want to be (non-parametric Bayes stuff, I don’t know too much about it), but there’s still a model.
Suggesting for the third and final time to get off the internet argument train and go read a book about Bayesian inference.
Oh, sorry I misunderstood your argument. That’s an interesting solution.
That interesting solution is exactly what people doing Bayesian inference do. Any criticism you may have that doesn’t apply to what Ilya describes isn’t a criticism of Bayesian inference.
As much as I hate to do it, I am going to have to agree with Lumifer, you sound confused. Go read Bishop.
Not really. A hypothesis’s prior probability comes from the total of all of your knowledge; in order to determine that P(HA)=0.33 Lumifer needed the additional facts that there were three possibilities that were all equally likely.
It works just as well if I say that my prior is P(HA)=0.5, without any exhaustive enumeration of the other possibilities. Then evidence E confirms HA if P(HA|E)>P(HA).
(One should be suspicious that my prior probability assessment is a good one if I haven’t accounted for all the probability mass, but the mechanisms still work.)
Which is one of the other problems I was getting at
If you start with inconsistent assumptions, you get inconsistent conclusions. If you believe P(H)=1, P(A&B&C)=1, and P(H|A) etc. are all <1, then you have already made a mistake. Why are you blaming this on Bayesian confirmation theory?
You are confused. If p(H) = 1, p(H, anything) = 1 or 0, so p(H | anything) = 1 or 0, if p(anything) > 0.
Wait, how would you get P(H) = 1?
Fine. p(H) = 0.5, p(H|A) = 0.2, p(H|B) = 0.15, p(H|C) = 0.15 It’s not really relevant to the problem.
The relevance is that it’s a really weird way to set up a problem. If P(H)=1 and P(H|A)=0.4 then it is necessarily the case that P(A)=0. If that’s not immediately obvious to you, you may want to come back to this topic after sleeping on it.
Fair enough.
\sum_i p(H|i) need not add up to p(H) (or indeed to 1).
No, it doesn’t.
Edit—I’m agreeing with you. Sorry if that wasn’t clear.