My guess is that you would still be in favor of Bayes as a normative standard of epistemology even if you rejected Dutch book arguments, and the reason why you like it is because you feel like it has been useful for solving a large number of problems.
Um, nope. What it would really take to change my mind about Bayes is seeing a refutation of Dutch Book and Cox’s Theorem and Von Neumann-Morgenstern and the complete class theorem , combined with seeing some alternative epistemology (e.g. Dempster-Shafer) not turn out to completely blow up when subjected to the same kind of scrutiny as Bayesianism (the way DS brackets almost immediately go to [0-1] and fuzzy logic turned out to be useless etc.)
Neural nets have been useful for solving a large number of problems. It doesn’t make them good epistemology. It doesn’t make them a plausible candidate for “Yes, this is how you need to organize your thinking about your AI’s thinking and if you don’t your AI will explode”.
some of which Bayesian statistics cannot solve, as I have demonstrated in this post.
I am afraid that your demonstration was not stated sufficiently precisely for me to criticize. This seems like the sort of thing for which there ought to be a standard reference, if there were such a thing as a well-known problem which Bayesian epistemology could not handle. For example, we have well-known critiques and literature claiming that nonconglomerability is a problem for Bayesianism, and we have a chapter of Jaynes which neatly shows that they all arise from misuse of limits on infinite problems. Is there a corresponding literature for your alleged reductio of Bayesianism which I can consult? Now, I am a great believer in civilizational inadequacy and the fact that the incompetence of academia is increasing, so perhaps if this problem was recently invented there is no more literature about it. I don’t want to be a hypocrite about the fact that sometimes something is true and nobody has written it up anyway, heaven knows that’s true all the time in my world. But the fact remains that I am accustomed to somewhat more detailed math when it comes to providing an alleged reductio of the standard edifice of decision theory. I know your time is limited, but the real fact is that I really do need more detail to think that I’ve seen a criticism and be convinced that no response to that criticism exists. Should your flat assertion that Bayesian methods can’t handle something and fall flat so badly as to constitute a critique of Bayesian epistemology, be something that I find convincing?
We’ve already discussed this in one of the other threads, but I’ll just repeat here that this isn’t correct. With overwhelmingly high probability a Gaussian matrix will satisfy the restricted isometry property, which implies that appropriately L1-regularized least squares will return the exact solution.
Okay. Though I note that you haven’t actually said that my intuitions (and/or my reading of Wikipedia) were wrong; many NP-hard problems will be easy to solve for a randomly generated case.
Anyway, suppose a standard L1-penalty algorithm solves a random case of this problem. Why do you think that’s a reductio of Bayesian epistemology? Because the randomly generated weights mean that a Bayesian viewpoint says the credibility is going as the L2 norm on the non-zero weights, but we used an L1 algorithm to find which weights were non-zero? I am unable to parse this into the justifications I am accustomed to hearing for rejecting an epistemology. It seems like you’re saying that one algorithm is more effective at finding the maximum of a Bayesian probability landscape than another algorithm; in a case where we both agree that the unbounded form of the Bayesian algorithm would work.
What destroys an epistemology’s credibility is a case where even in the limit of unbounded computing power and well-calibrated prior knowledge, a set of rules just returns the wrong answer. The inherent subjectivity of p-values as described in http://lesswrong.com/lw/1gc/frequentist_statistics_are_frequently_subjective/ is not something you can make go away with a better-calibrated prior, correct use of limits, or unlimited computing power; it’s the result of bad epistemology. This is the kind of smoking gun it would take to make me stop yammering about probability theory and Bayes’s rule. Showing me algorithms which don’t on the surface seem Bayesian but find good points on a Bayesian fitness landscape isn’t going to cut it!
Eliezer, I included a criticism of both complete class and Dutch book right at the very beginning, in Myth 1. If you find them unsatisfactory, can you at least indicate why?
Your criticism of Dutch Book is that it doesn’t seem to you useful to add anti-Dutch-book checkers to your toolbox. My support of Dutch Book is that if something inherently produces Dutch Books then it can’t be the right epistemological principle because clearly some of its answers must be wrong even in the limit of well-calibrated prior knowledge and unbounded computing power.
The complete class theorem I understand least of the set, and it’s probably not very much entwined with my true rejection so it would be logically rude to lead you on here. Again, though, the point that every local optimum is Bayesian tells us something about non-Bayesian rules producing intrinsically wrong answers. If I believed your criticism, I think it would be forceful; I could accept a world in which for every pair of a rational plan with a world, there is an irrational plan which does better in that world, but no plausible way for a cognitive algorithm to output that irrational plan—the plans which are equivalent of “Just buy the winning lottery ticket, and you’ll make more money!” I can imagine being shown that the complete class theorem demonstrates only an “unfair” superiority of this sort, and that only frequentist methods can produce actual outputs for realistic situations even in the limit of unbounded computing power. But I do not believe that you have leveled such a criticism. And it doesn’t square very much with my current understanding that the decision rules being considered are computable rules from observations to actions. You didn’t actually tell me about a frequentist algorithm which is supposed to be realistic and show why the Bayesian rule which beats it is beating it unfairly.
If you want to hit me square in the true rejection I suggest starting with VNM. The fact that our epistemology has to plug into our actions is one reason why I roll my eyes at the likes of Dempster-Shafer or frequentist confidence intervals that don’t convert to credibility distributions.
I could accept a world in which for every pair of a rational plan with a world, there is an irrational plan which does better in that world, but no plausible way for a cognitive algorithm to output that irrational plan
We already live in that world.
(The following is not evidence, just an illustrative analogy) Ever seen Groundhog Day? Imagine him skipping the bulk of the movie and going straight to the last day. It is straight wall to wall WTF but it’s very optimal.
One of the criticisms I raised is that merely being able to point to all the local optima is not a particularly impressive property of an epistemological theory. Many of those local optima will be horrible! (My criticism of VNM is essentially the same.)
Many frequentist methods, such as minimax, also provide local optima, but they provide local optima which actually have certain nice properties. And minimax provides a complete decision rule, not just a probability distribution, so it plugs directly into actions.
Um, nope. What it would really take to change my mind about Bayes is seeing a refutation of Dutch Book and Cox’s Theorem and Von Neumann-Morgenstern and the complete class theorem , combined with seeing some alternative epistemology (e.g. Dempster-Shafer) not turn out to completely blow up when subjected to the same kind of scrutiny as Bayesianism (the way DS brackets almost immediately go to [0-1] and fuzzy logic turned out to be useless etc.)
Neural nets have been useful for solving a large number of problems. It doesn’t make them good epistemology. It doesn’t make them a plausible candidate for “Yes, this is how you need to organize your thinking about your AI’s thinking and if you don’t your AI will explode”.
I am afraid that your demonstration was not stated sufficiently precisely for me to criticize. This seems like the sort of thing for which there ought to be a standard reference, if there were such a thing as a well-known problem which Bayesian epistemology could not handle. For example, we have well-known critiques and literature claiming that nonconglomerability is a problem for Bayesianism, and we have a chapter of Jaynes which neatly shows that they all arise from misuse of limits on infinite problems. Is there a corresponding literature for your alleged reductio of Bayesianism which I can consult? Now, I am a great believer in civilizational inadequacy and the fact that the incompetence of academia is increasing, so perhaps if this problem was recently invented there is no more literature about it. I don’t want to be a hypocrite about the fact that sometimes something is true and nobody has written it up anyway, heaven knows that’s true all the time in my world. But the fact remains that I am accustomed to somewhat more detailed math when it comes to providing an alleged reductio of the standard edifice of decision theory. I know your time is limited, but the real fact is that I really do need more detail to think that I’ve seen a criticism and be convinced that no response to that criticism exists. Should your flat assertion that Bayesian methods can’t handle something and fall flat so badly as to constitute a critique of Bayesian epistemology, be something that I find convincing?
Okay. Though I note that you haven’t actually said that my intuitions (and/or my reading of Wikipedia) were wrong; many NP-hard problems will be easy to solve for a randomly generated case.
Anyway, suppose a standard L1-penalty algorithm solves a random case of this problem. Why do you think that’s a reductio of Bayesian epistemology? Because the randomly generated weights mean that a Bayesian viewpoint says the credibility is going as the L2 norm on the non-zero weights, but we used an L1 algorithm to find which weights were non-zero? I am unable to parse this into the justifications I am accustomed to hearing for rejecting an epistemology. It seems like you’re saying that one algorithm is more effective at finding the maximum of a Bayesian probability landscape than another algorithm; in a case where we both agree that the unbounded form of the Bayesian algorithm would work.
What destroys an epistemology’s credibility is a case where even in the limit of unbounded computing power and well-calibrated prior knowledge, a set of rules just returns the wrong answer. The inherent subjectivity of p-values as described in http://lesswrong.com/lw/1gc/frequentist_statistics_are_frequently_subjective/ is not something you can make go away with a better-calibrated prior, correct use of limits, or unlimited computing power; it’s the result of bad epistemology. This is the kind of smoking gun it would take to make me stop yammering about probability theory and Bayes’s rule. Showing me algorithms which don’t on the surface seem Bayesian but find good points on a Bayesian fitness landscape isn’t going to cut it!
Eliezer, I included a criticism of both complete class and Dutch book right at the very beginning, in Myth 1. If you find them unsatisfactory, can you at least indicate why?
Your criticism of Dutch Book is that it doesn’t seem to you useful to add anti-Dutch-book checkers to your toolbox. My support of Dutch Book is that if something inherently produces Dutch Books then it can’t be the right epistemological principle because clearly some of its answers must be wrong even in the limit of well-calibrated prior knowledge and unbounded computing power.
The complete class theorem I understand least of the set, and it’s probably not very much entwined with my true rejection so it would be logically rude to lead you on here. Again, though, the point that every local optimum is Bayesian tells us something about non-Bayesian rules producing intrinsically wrong answers. If I believed your criticism, I think it would be forceful; I could accept a world in which for every pair of a rational plan with a world, there is an irrational plan which does better in that world, but no plausible way for a cognitive algorithm to output that irrational plan—the plans which are equivalent of “Just buy the winning lottery ticket, and you’ll make more money!” I can imagine being shown that the complete class theorem demonstrates only an “unfair” superiority of this sort, and that only frequentist methods can produce actual outputs for realistic situations even in the limit of unbounded computing power. But I do not believe that you have leveled such a criticism. And it doesn’t square very much with my current understanding that the decision rules being considered are computable rules from observations to actions. You didn’t actually tell me about a frequentist algorithm which is supposed to be realistic and show why the Bayesian rule which beats it is beating it unfairly.
If you want to hit me square in the true rejection I suggest starting with VNM. The fact that our epistemology has to plug into our actions is one reason why I roll my eyes at the likes of Dempster-Shafer or frequentist confidence intervals that don’t convert to credibility distributions.
We already live in that world.
(The following is not evidence, just an illustrative analogy) Ever seen Groundhog Day? Imagine him skipping the bulk of the movie and going straight to the last day. It is straight wall to wall WTF but it’s very optimal.
One of the criticisms I raised is that merely being able to point to all the local optima is not a particularly impressive property of an epistemological theory. Many of those local optima will be horrible! (My criticism of VNM is essentially the same.)
Many frequentist methods, such as minimax, also provide local optima, but they provide local optima which actually have certain nice properties. And minimax provides a complete decision rule, not just a probability distribution, so it plugs directly into actions.
FYI, there are published counterexamples to Cox’s theorem. See for example Joseph Halpern’s at http://arxiv.org/pdf/1105.5450.pdf.
You need to not include the period in your link, like so.