Perhaps a Bayesian approach would be illuminating. There are four kinds of objects in the world: black ravens, nonblack ravens, black nonravens, and nonblack nonravens. Call these A, B, C, and D. Let the probability you assign to the next object that you encounter being in one of these classes be p, q, r, and s respectively. Rather than having two competing hypotheses about the blackness of ravens, there is a prior distribution of the parameters p, q, r, and s.
(Note that the way I’ve set this up removes any concept of blackness common to black ravens and black nonravens. The astute—more astute than me, for whom this is the last paragraph written—may guess at once that P naq Q ner tbvat gb or rkpunatrnoyr va guvf sbezhyngvba, naq gurersber arvgure zber guna gur bgure pna or rivqrapr eryngvat gb gur inyhrf bs c naq d. I come back to this at the end.)
In a state of total ignorance, a reasonable prior for the distribution of (p,q,r,s) is that they are uniformly distributed over the tetrahedron in four-dimensional space defined by these numbers being in the range 0 to 1 and their sum being 1.
After observing numbers a, b, c, and d of the four categories, the posterior is (after a bit of mathematics) p^a q^b r^c s^d/K(a,b,c,d), where K(a,b,c,d) = a!b!c!d!/(N+3)!, where N = a+b+c+d. (The formula generalises to any number of categories, replacing 3 by the number of categories minus 1.)
The expectation value of p is K(a+1,b,c,d)/K(a,b,c,d) = (a+1)/(N+4), and similarly for q, r, and s. (Check: these add up to 1, as they should.)
How does the expectation value of p change when you observe that the N+1′th object you draw is an A, B, C, or D?
If it’s an A, the ratio of the new expectation value to the old is (a+2)(N+4)/(a+1)(N+5). For large N this is approximately 1 + 1/(a+1) − 1/(N+5) > 1.
If it’s a B (and the cases of C and D are the same) then the ratio is (N+4)/(N+5) = 1 − 1/(N+5) < 1.
So observing an A increases your estimate of the proportion of the population that are A, and observing anything else decreases it, as one would expect. That was just another sanity check.
Now consider the ratio q/p, the ratio of non-black to black ravens. The expectation of this, assuming a>0 (you have seen at least one black raven), is K(a-1,b+1,c,d)/K(a,b,c,d) = (b+1)/a. This increases to (b+2)/a when you observe a nonblack raven, and decreases to (b+1)/(a+1) when you observe a black one. (I would have calculated the expectation of q/(q+p), the expected proportion of ravens that are nonblack, but that is more complicated.)
If you have seen a thousand black ravens and no nonblack ones, the increase is from 1/1000 to 2/1000, i.e. a doubling, but the decrease is from 1/1000 to 1/1001, a tiny amount. On the log-odds scale, the first is 1 bit, the second is about 0.0014 bits.
On this analysis, observations of nonravens, whether black or not, have no effect on the expectation of the proportion of nonblack ravens.
If we reformulate the original hypothesis that all ravens are black as “q/p < 0.000001”, then observing the 1001th raven to be green will pretty much kill that hypothesis, until we see of the order of a million black ravens in a row without a nonblack one. But the nonraven objects will continue to be irrelevant: C and D are exchangeable in this formulation of the problem.
Now reconsider the original paradox on its own terms. I will draw a connection with the grue paradox.
Suppose we accept the paradoxical argument that “All ravens are black” and “all nonblack things are nonravens” are logically equivalent, and therefore everything that is evidence for one is evidence for the other.
Let “X is bnonb” mean “X is a black raven or a nonblack nonraven.” Consider the hypothesis that all ravens are bnonb, and its contrapositive, that all non-bnonb things are nonravens. In effect, we have exchanged C and D, but not A and B. Every argument that nonblack nonravens are evidence for all ravens being black is also an argument than nonbnonb nonravens are evidence for all ravens being bnonb. But substituting the definition of bnonb in the latter, it claims that black nonravens are evidence for the blackness of ravens. Hence both black and nonblack nonravens support the blackness of ravens.
But there’s more. Swapping black and nonblack in all of the above would imply that both black and nonblack nonravens are evidence for the nonblackness of ravens.
At this point we appear to have proved that all nonravens are evidence for every hypothesis about ravens. I don’t think the original paradox can be saved by arguing that yes, nonblack nonravens are evidence, just an utterly insignificant amount, as some do.
A further elaboration then occurred to me. If non-ravens are, as the above argument claims, not evidential for the properties of ravens, then neither are non-European ravens evidential for the properties of European ravens, which does not seem plausible. This amount of confusion suggests that some essential idea is missing. I had thought causality or mechanism, but the Google search suggested by that turned up this paper: “Infinitely many resolutions of Hempel’s paradox” by Kevin Korb, which takes a purely Bayesian approach, which I think has something in common (in section 4.1) with the arguments of the original post. His conclusion:
We should well and truly forget about positive instance confirmation: it is an epiphenomenon of Bayesian confirmation. There is no qualitative theory of confirmation that can adequately approximate what likelihood ratios tell us about confirmation; nor can any qualitative theory lay claim to the success (real, if limited) of Bayesian confirmation theory in accounting for scientific methodology.
ETA: Another paper with a Bayesian analysis of the subject.
And then there is the Wason selection task, where you do have to examine both the raven and the non-black object to determine the truth of “all ravens are black”. But with actual ravens and bananas, when you pick up a non-black object, you will already have seen whether it is a raven or not. Given that it is not a raven, examination of its colour tells you nothing more about ravens.
“A further elaboration then occurred to me. If non-ravens are, as the above argument claims, not evidential for the properties of ravens, then neither are non-European ravens evidential for the properties of European ravens, which does not seem plausible.”—Wait so you’re saying that the argument you just made in the post above is incorrect? Or that the argument in main is incorrect?
Hempel gave an argument for a conclusion that seems absurd. I first elaborated a Bayesian argument for arriving at the opposite of the absurd conclusion, and because the conclusion (non-black non-ravens say nothing about the blackness of ravens) seems at first sight reasonable, one might think the argument reasonable (which is not reasonable, because there is nothing to stop a bad argument giving a correct conclusion).
Then I showed that combining Hempel’s argument with the grue-like concept of bnonb yielded a Hempel-style argument for non-ravens of all colours being evidence for the blackness of ravens, and further extended it to show that all properties of non-ravens are evidence for all properties of ravens.
Then I took my original argument and observed that it still works after replacing “raven” and “non-raven” by “European raven” and “non-European raven”.
At this point both arguments are producing absurd results. Hempel’s has broadened to proving that everything is evidence for everything else, and mine to proving that nothing is evidence for anything else.
I shall have to work through the arguments of Korb and Gilboa to see what they yield when applied to bnonb ravens.
Meanwhile, the unanswered question is, when can an observation of one object tell you something about another object not yet observed?
Having now properly read Korb’s paper, the basic problem he points out is that to do a Bayesian update regarding a hypothesis h in the presence of new evidence e, one must calculate the likelihood ratio P(e|h)/P(e|not-h). Not-h consists of the whole of the hypothesis space excluding h. What that hypothesis space is affects the likelihood ratio. The ratio can be made equal to anything at all, for some suitable choice of the hypothesis space, by constructions similar to those of the OP.
It makes the same negative conclusion when applied to bnonb ravens, or to European and non-European ravens.
Although this settles Hempel’s paradox, it leaves unanswered a more fundamental question: how should you update in the face of new evidence? The Bayesian answer is on the face of it simple mathematics: P(e|h)/P(e|not-h). But where does the hypothesis space that defines not-h come from?
In “small world” examples of Bayesian reasoning, the hypothesis space is a parameterised family of distributions, and the prior is a probability distribution on the parameter space. New evidence will shift that distribution. If the truth is a member of that family, evidence is likely to converge on the correct parameters.
I have never seen a convincing account of how to do “large world” Bayesian reasoning, where the hypothesis space is “all theories whatsoever, even yet-unimagined ones, describing this aspect of the world”. Solomonoff induction is the least unconvincing, by virtue only of being precisely defined and having various theorems provable about it, but one of those theorems is that it is uncomputable. Until I see someone make some sort of Solomonoff-based method work to the extent of becoming a standard part of the statistician’s toolkit, I shall continue to be sceptical of whether it has any practical numerical use. How should you navigate in a large-world hypothesis space, when you notice that P(e|h) is so absurdly low that the truth, whatever it is, must be elsewhere?
Given the existence of polar bears, arctic foxes, and snow leopards, I wondered if there might be any white-feathered ravens in the colder parts of the world. A Google search indicates that while ravens are found there, they are just as black as their temperate relatives. I guess you don’t need camouflage to sneak up on corpses. Now that looks like good evidence for all ravens being black: looking in places where it is plausible that there could be white ravens, and finding ravens, but only black ones. The not-h hypothesis space has room for large numbers of white ravens in a certain type of remote place. That part of the space came from observing polar bears and the like, and imagining a similar mechanism, whatever it might be, in ravens. Finding that even there, all observed ravens are black, removes probability mass from that part of the space.
Perhaps a Bayesian approach would be illuminating. There are four kinds of objects in the world: black ravens, nonblack ravens, black nonravens, and nonblack nonravens. Call these A, B, C, and D. Let the probability you assign to the next object that you encounter being in one of these classes be p, q, r, and s respectively. Rather than having two competing hypotheses about the blackness of ravens, there is a prior distribution of the parameters p, q, r, and s.
(Note that the way I’ve set this up removes any concept of blackness common to black ravens and black nonravens. The astute—more astute than me, for whom this is the last paragraph written—may guess at once that P naq Q ner tbvat gb or rkpunatrnoyr va guvf sbezhyngvba, naq gurersber arvgure zber guna gur bgure pna or rivqrapr eryngvat gb gur inyhrf bs c naq d. I come back to this at the end.)
In a state of total ignorance, a reasonable prior for the distribution of (p,q,r,s) is that they are uniformly distributed over the tetrahedron in four-dimensional space defined by these numbers being in the range 0 to 1 and their sum being 1.
After observing numbers a, b, c, and d of the four categories, the posterior is (after a bit of mathematics) p^a q^b r^c s^d/K(a,b,c,d), where K(a,b,c,d) = a!b!c!d!/(N+3)!, where N = a+b+c+d. (The formula generalises to any number of categories, replacing 3 by the number of categories minus 1.)
The expectation value of p is K(a+1,b,c,d)/K(a,b,c,d) = (a+1)/(N+4), and similarly for q, r, and s. (Check: these add up to 1, as they should.)
How does the expectation value of p change when you observe that the N+1′th object you draw is an A, B, C, or D?
If it’s an A, the ratio of the new expectation value to the old is (a+2)(N+4)/(a+1)(N+5). For large N this is approximately 1 + 1/(a+1) − 1/(N+5) > 1.
If it’s a B (and the cases of C and D are the same) then the ratio is (N+4)/(N+5) = 1 − 1/(N+5) < 1.
So observing an A increases your estimate of the proportion of the population that are A, and observing anything else decreases it, as one would expect. That was just another sanity check.
Now consider the ratio q/p, the ratio of non-black to black ravens. The expectation of this, assuming a>0 (you have seen at least one black raven), is K(a-1,b+1,c,d)/K(a,b,c,d) = (b+1)/a. This increases to (b+2)/a when you observe a nonblack raven, and decreases to (b+1)/(a+1) when you observe a black one. (I would have calculated the expectation of q/(q+p), the expected proportion of ravens that are nonblack, but that is more complicated.)
If you have seen a thousand black ravens and no nonblack ones, the increase is from 1/1000 to 2/1000, i.e. a doubling, but the decrease is from 1/1000 to 1/1001, a tiny amount. On the log-odds scale, the first is 1 bit, the second is about 0.0014 bits.
On this analysis, observations of nonravens, whether black or not, have no effect on the expectation of the proportion of nonblack ravens.
If we reformulate the original hypothesis that all ravens are black as “q/p < 0.000001”, then observing the 1001th raven to be green will pretty much kill that hypothesis, until we see of the order of a million black ravens in a row without a nonblack one. But the nonraven objects will continue to be irrelevant: C and D are exchangeable in this formulation of the problem.
Now reconsider the original paradox on its own terms. I will draw a connection with the grue paradox.
Suppose we accept the paradoxical argument that “All ravens are black” and “all nonblack things are nonravens” are logically equivalent, and therefore everything that is evidence for one is evidence for the other.
Let “X is bnonb” mean “X is a black raven or a nonblack nonraven.” Consider the hypothesis that all ravens are bnonb, and its contrapositive, that all non-bnonb things are nonravens. In effect, we have exchanged C and D, but not A and B. Every argument that nonblack nonravens are evidence for all ravens being black is also an argument than nonbnonb nonravens are evidence for all ravens being bnonb. But substituting the definition of bnonb in the latter, it claims that black nonravens are evidence for the blackness of ravens. Hence both black and nonblack nonravens support the blackness of ravens.
But there’s more. Swapping black and nonblack in all of the above would imply that both black and nonblack nonravens are evidence for the nonblackness of ravens.
At this point we appear to have proved that all nonravens are evidence for every hypothesis about ravens. I don’t think the original paradox can be saved by arguing that yes, nonblack nonravens are evidence, just an utterly insignificant amount, as some do.
A further elaboration then occurred to me. If non-ravens are, as the above argument claims, not evidential for the properties of ravens, then neither are non-European ravens evidential for the properties of European ravens, which does not seem plausible. This amount of confusion suggests that some essential idea is missing. I had thought causality or mechanism, but the Google search suggested by that turned up this paper: “Infinitely many resolutions of Hempel’s paradox” by Kevin Korb, which takes a purely Bayesian approach, which I think has something in common (in section 4.1) with the arguments of the original post. His conclusion:
ETA: Another paper with a Bayesian analysis of the subject.
And then there is the Wason selection task, where you do have to examine both the raven and the non-black object to determine the truth of “all ravens are black”. But with actual ravens and bananas, when you pick up a non-black object, you will already have seen whether it is a raven or not. Given that it is not a raven, examination of its colour tells you nothing more about ravens.
“A further elaboration then occurred to me. If non-ravens are, as the above argument claims, not evidential for the properties of ravens, then neither are non-European ravens evidential for the properties of European ravens, which does not seem plausible.”—Wait so you’re saying that the argument you just made in the post above is incorrect? Or that the argument in main is incorrect?
I am saying that I am confused.
Hempel gave an argument for a conclusion that seems absurd. I first elaborated a Bayesian argument for arriving at the opposite of the absurd conclusion, and because the conclusion (non-black non-ravens say nothing about the blackness of ravens) seems at first sight reasonable, one might think the argument reasonable (which is not reasonable, because there is nothing to stop a bad argument giving a correct conclusion).
Then I showed that combining Hempel’s argument with the grue-like concept of bnonb yielded a Hempel-style argument for non-ravens of all colours being evidence for the blackness of ravens, and further extended it to show that all properties of non-ravens are evidence for all properties of ravens.
Then I took my original argument and observed that it still works after replacing “raven” and “non-raven” by “European raven” and “non-European raven”.
At this point both arguments are producing absurd results. Hempel’s has broadened to proving that everything is evidence for everything else, and mine to proving that nothing is evidence for anything else.
I shall have to work through the arguments of Korb and Gilboa to see what they yield when applied to bnonb ravens.
Meanwhile, the unanswered question is, when can an observation of one object tell you something about another object not yet observed?
Having now properly read Korb’s paper, the basic problem he points out is that to do a Bayesian update regarding a hypothesis h in the presence of new evidence e, one must calculate the likelihood ratio P(e|h)/P(e|not-h). Not-h consists of the whole of the hypothesis space excluding h. What that hypothesis space is affects the likelihood ratio. The ratio can be made equal to anything at all, for some suitable choice of the hypothesis space, by constructions similar to those of the OP.
It makes the same negative conclusion when applied to bnonb ravens, or to European and non-European ravens.
Although this settles Hempel’s paradox, it leaves unanswered a more fundamental question: how should you update in the face of new evidence? The Bayesian answer is on the face of it simple mathematics: P(e|h)/P(e|not-h). But where does the hypothesis space that defines not-h come from?
In “small world” examples of Bayesian reasoning, the hypothesis space is a parameterised family of distributions, and the prior is a probability distribution on the parameter space. New evidence will shift that distribution. If the truth is a member of that family, evidence is likely to converge on the correct parameters.
I have never seen a convincing account of how to do “large world” Bayesian reasoning, where the hypothesis space is “all theories whatsoever, even yet-unimagined ones, describing this aspect of the world”. Solomonoff induction is the least unconvincing, by virtue only of being precisely defined and having various theorems provable about it, but one of those theorems is that it is uncomputable. Until I see someone make some sort of Solomonoff-based method work to the extent of becoming a standard part of the statistician’s toolkit, I shall continue to be sceptical of whether it has any practical numerical use. How should you navigate in a large-world hypothesis space, when you notice that P(e|h) is so absurdly low that the truth, whatever it is, must be elsewhere?
Given the existence of polar bears, arctic foxes, and snow leopards, I wondered if there might be any white-feathered ravens in the colder parts of the world. A Google search indicates that while ravens are found there, they are just as black as their temperate relatives. I guess you don’t need camouflage to sneak up on corpses. Now that looks like good evidence for all ravens being black: looking in places where it is plausible that there could be white ravens, and finding ravens, but only black ones. The not-h hypothesis space has room for large numbers of white ravens in a certain type of remote place. That part of the space came from observing polar bears and the like, and imagining a similar mechanism, whatever it might be, in ravens. Finding that even there, all observed ravens are black, removes probability mass from that part of the space.
An excellent quote! If Stefan had found that one I should have been honor-bound to add it to the post :P