[i]n your theory A, you can predict X with probability 1...
[...] The rules of Bayesian rationality preclude assigning an a priori probability of 1 to a synthetic proposition: nothing empirical is so certain that refuting evidence is impossible.
We all agree on this point. Yudkowsky isn’t supposing that anything empirical has probability 1.
In the line you quote, Yudkowsky is saying that even if theory A predicts data X with probability 1 (setting aside the question of whether this is even possible), confirming that X is true still wouldn’t push our confidence in the truth of A past a certain threshold, which might be far short of 1. (In particular, merely confirming a prediction X of A can never push the posterior probability of A above p(A|X), which might still be too small because too many alternative theories also predict X). A falsification, on the other hand, can drive the probability of a theory very low, provided that the theory makes some prediction with high confidence (which needn’t be equal to 1) that has a low prior probability.
That is the sense in which it is true that falsifications tend to be more decisive than confirmations. So, a certain limited and “caveated”, but also more precise and quantifiable, version of Popper’s falsificationism is correct.
But since P(X/A) is always “intermediate,” observing X will never strictly falsify A—which is a good thing because the falsification prong of Popperianism has proven at least as scientifically problematic as the nonverification prong.
Yes, no observation will drive the probability of a theory down to precisely 0. The probability can only be driven very low. That is why I called falsificationism an “an exaggerated version of a mathematically valid claim”.
I don’t think falsification can be squared with Bayes, even as a limiting case. In Basesian theory, verification and falsification are symmetric (as the slider metaphor really indicates). In principle, you can’t strictly falsify a theory empirically any more (or less) than you can verify one. Verification, as the quoted essay confirms, is blocked by the > 0 probability mandatorily assigned to unpredicted outcomes; falsification is blocked by the < 1 probability mandatorily assigned to the expected results. It is no less irrational to be certain that X holds given A than to be certain that X fails given not-A. You are no more justified in assuming absolutely that your abstractions don’t leak than in assuming you can range over all explanations.
As you say, getting to probability 0 is as impossible as getting to probability 1. But getting close to probability 0 is easier than getting equally close to probability 1.
This asymmetry is possible because different kinds of propositions are more or less amenable to being assigned extremely high or low probability. It is relatively easier to show that some data has extremely high or low probability (whether conditional on some theory or a priori) than it is to show that some theory has extremely high conditional probability.
Fix a theory A. It is very hard to think up an experiment with a possible outcome X such that p(A | X) is nearly 1. To do this, you would need to show that no other possible theory, even among the many theories you haven’t thought of, could have a significant amount of probability, conditional on observing X.
It is relatively easy to think up an experiment with a possible outcome X, which your theory A predicts with very high probability, but which has very low prior probability. To accomplish this, you only need to exhibit some other a priori plausible outcomes different from X.
In the second case, you need to show that the probability of some data is extremely high a posteriori and extremely low a priori. In the first case, you need to show that the a posteriori probability of a theory is extremely high.
In the second case, you only need to construct enough alternative outcomes to certify your claim. In the first case, you need to prove a universal statement about all possible theories.
One root of the asymmetry is this: As hard as it might be to establish extreme probabilities for data, at least the data usually come from a reasonably well-understood parameter space (the real numbers, say). But the space of all possible theories is not well understood, at least not in any computationally tractable way.
In the second case, you only need to construct enough alternative outcomes to certify your claim. In the first case, you need to prove a universal statement about all possible theories.
All these arguments are at best suggestive. Our abductive capacities are such as to suggest that proving a universal statement about all possible theories isn’t necessarily hard. Your arguments, I think, flow from and then confirm a nominalistic bias: accept concrete data; beware of general theories.
There are universal statements known with greater certainly than any particular data, e.g., life evolved from inanimate matter and mind always supervenes on physics.
some universal statements about all theories are very probable, and that
some of our theories are more probable than any particular data.
I’m not seeing why either of these facts are in tension with my previous comment. Would you elaborate?
The claims I made are true of certain priors. I’m not trying to argue you into using such a prior. Right now I only want to make the points that (1) a Bayesian can coherently use a prior satisfying the properties I described, and that (2) falsificationism is true, in a weakened but precise sense, under such a prior.
We all agree on this point. Yudkowsky isn’t supposing that anything empirical has probability 1.
In the line you quote, Yudkowsky is saying that even if theory A predicts data X with probability 1 (setting aside the question of whether this is even possible), confirming that X is true still wouldn’t push our confidence in the truth of A past a certain threshold, which might be far short of 1. (In particular, merely confirming a prediction X of A can never push the posterior probability of A above p(A|X), which might still be too small because too many alternative theories also predict X). A falsification, on the other hand, can drive the probability of a theory very low, provided that the theory makes some prediction with high confidence (which needn’t be equal to 1) that has a low prior probability.
That is the sense in which it is true that falsifications tend to be more decisive than confirmations. So, a certain limited and “caveated”, but also more precise and quantifiable, version of Popper’s falsificationism is correct.
Yes, no observation will drive the probability of a theory down to precisely 0. The probability can only be driven very low. That is why I called falsificationism an “an exaggerated version of a mathematically valid claim”.
As you say, getting to probability 0 is as impossible as getting to probability 1. But getting close to probability 0 is easier than getting equally close to probability 1.
This asymmetry is possible because different kinds of propositions are more or less amenable to being assigned extremely high or low probability. It is relatively easier to show that some data has extremely high or low probability (whether conditional on some theory or a priori) than it is to show that some theory has extremely high conditional probability.
Fix a theory A. It is very hard to think up an experiment with a possible outcome X such that p(A | X) is nearly 1. To do this, you would need to show that no other possible theory, even among the many theories you haven’t thought of, could have a significant amount of probability, conditional on observing X.
It is relatively easy to think up an experiment with a possible outcome X, which your theory A predicts with very high probability, but which has very low prior probability. To accomplish this, you only need to exhibit some other a priori plausible outcomes different from X.
In the second case, you need to show that the probability of some data is extremely high a posteriori and extremely low a priori. In the first case, you need to show that the a posteriori probability of a theory is extremely high.
In the second case, you only need to construct enough alternative outcomes to certify your claim. In the first case, you need to prove a universal statement about all possible theories.
One root of the asymmetry is this: As hard as it might be to establish extreme probabilities for data, at least the data usually come from a reasonably well-understood parameter space (the real numbers, say). But the space of all possible theories is not well understood, at least not in any computationally tractable way.
All these arguments are at best suggestive. Our abductive capacities are such as to suggest that proving a universal statement about all possible theories isn’t necessarily hard. Your arguments, I think, flow from and then confirm a nominalistic bias: accept concrete data; beware of general theories.
There are universal statements known with greater certainly than any particular data, e.g., life evolved from inanimate matter and mind always supervenes on physics.
I agree that
some universal statements about all theories are very probable, and that
some of our theories are more probable than any particular data.
I’m not seeing why either of these facts are in tension with my previous comment. Would you elaborate?
The claims I made are true of certain priors. I’m not trying to argue you into using such a prior. Right now I only want to make the points that (1) a Bayesian can coherently use a prior satisfying the properties I described, and that (2) falsificationism is true, in a weakened but precise sense, under such a prior.