nostalgebraist comments on nostalgebraist—bayes: a kinda-sorta masterpost

nostalgebraist 5 Sep 2018 20:22 UTC
18 points
I disagree that this answers my criticisms. In particular, my section 7 argues that it’s practically unfeasible to even write down most practical belief / decision problems in the form that the Bayesian laws require, so “were the laws followed?” is generally not even a well-defined question.
To be a bit more precise, the framework with a complete hypothesis space is a bad model for the problems of interest. As I detailed in section 7, that framework assumes that our knowledge of hypotheses and the logical relations between hypotheses are specified “at the same time,” i.e. when we know about a hypothesis we also know all its logical relations to all other hypotheses, and when we know (implicitly) about a logical relation we also have access (explicitly) to the hypotheses it relates. Not only is this false in many practical cases, I don’t even know of any formalism that would allow us to call it “approximately true,” or “true enough for the optimality theorems to carry over.”
(N.B. as it happens, I don’t think logical inductors fix this problem. But the very existence of logical induction as a research area shows that this is a problem. Either we care about the consequences of lacking logical omniscience, or we don’t—and apparently we do.)
It’s sort of like quoting an optimality result given access to some oracle, when talking about a problem without access to that oracle. If the preconditions of a theorem are not met by the definition of a given decision problem, “meet those preconditions” cannot be part of a strategy for that problem. “Solve a different problem so you can use my theorem” is not a solution to the problem as stated.
Importantly, this is not just an issue of “we can’t do perfect Bayes in practice, but if we were able, it’d be better.” Obtaining the kind of knowledge representation assumed by the Bayesian laws has computational / resource costs, and in any real decision problem, we want to minimize these. If we’re handed the “right” knowledge representation by a genie, fine, but if we are talking about choosing to generate it, that in itself is a decision with costs.
As a side point, I am also skeptical of some of the optimality results.
- jessicata 5 Sep 2018 21:13 UTC
  23 points
  Parent
  Let’s zoom in on the oracle problem since it seems to be at the heart of the issue. You write:
  
  It’s sort of like quoting an optimality result given access to some oracle, when talking about a problem without access to that oracle. If the preconditions of a theorem are not met by the definition of a given decision problem, “meet those preconditions” cannot be part of a strategy for that problem.
  
  Here, it seems like you are doing something like interpreting a Msr. Law statement of the form “this Turing machine that has access to a halting oracle decides provability in PA” as a strategy for deciding provability in PA (as Eliezer’s stereotype of Msr. Toolbox would). But the statement is true independent of whether it corresponds to a strategy for deciding provability in PA, and the statement is actually useful in formal logic. Obviously if you wanted to design an automated theorem prover for PA (applicable to some but not all practical problems) you would need a different strategy, and the fact that some specific Turing machine with access to a halting oracle decides provability in PA might or might not be relevant to your strategy.
  
  I agree that applying Bayes’ law as stated has resource costs and requires formally characterizing the hypothesis space, which is usually (but not always) hard in practice. The consequences of logical non-omniscience really matter, which is one reason that Bayesianism is not a complete epistemology.
  - nostalgebraist 6 Sep 2018 1:25 UTC
    7 points
    Parent
    I don’t disagree with any of this. But if I understand correctly, you’re only arguing against a very strong claim—something like “Bayes-related results cannot possibly have general relevance for real decisions, even via ‘indirect’ paths that don’t rely on viewing the real decisions in a Bayesian way.”
    I don’t endorse that claim, and would find it very hard to argue for. I can imagine virtually any mathematical result playing some useful role in some hypothetical framework for real decisions (although I would be more surprised in some cases than others), and I can’t see why Bayesian stuff should be less promising in that regard than any arbitrarily chosen piece of math. But “Bayes might be relevant, just like p-adic analysis might be relevant!” seems like damning with faint praise, given the more “direct” ambitions of Bayes as advocated by Jaynes and others.
    Is there a specific “indirect” path for the relevance of Bayes that you have in mind here?
    - jessicata 7 Sep 2018 3:28 UTC
      29 points
      Parent
      At a very rough guess, I think Bayesian thinking is helpful in 50-80% of nontrivial epistemic problems, more than p-adic analysis.
      
      How might the law-type properties be indirectly relevant? Here are some cases:
      
      In game theory it’s pretty common to assume that the players are Bayesian about certain properties of the environment (see Bayesian game). Some generality is lost by doing so (after all, reasoning about non-Bayesian players might be useful), but, due to the complete class theorems, less generality is lost than one might think, since (with some caveats) all policies that are not strictly dominated are Bayesian policies with respect to some prior.
      
      Sometimes likelihood ratios for different theories with respect to some test can be computed or approximated, e.g. in physics. Bayes’ rule yields a relationship between the prior and posterior probability. Even in the absence of a way to determine what the right prior for the different theories is, if we can form a set of “plausible” priors (e.g. based on parsimony of the different theories and existing evidence), then Bayes’ rule then yields a set of “plausible” posteriors, which can be narrow even if the set of plausible priors was broad.
      
      Bayes’ rule implies properties about belief updates such as conservation of expected evidence. If I expect my beliefs about some proposition to update in a particular direction in expectation, then I am expecting myself to violate Bayes’ rule, which implies (by CCT) that, if the set of decision problems I might face is sufficiently rich, I expect my beliefs to yield some strictly dominated decision rule. It is not clear what to do in this state of knowledge, but the fact that my decision rule is currently strictly dominated does imply that I am somewhat likely to make better decisions if I think about the structure of my beliefs, and where the inconsistency is coming from. (In effect, noticing violations of Bayes’ rule is a diagnostic tool similar to noticing violations of logical consistency)
      
      I do think that some advocacy of Bayesianism has been overly ambitious, for the reasons stated in your post as well as those in this post. I think Jaynes in particular is overly ambitious in applications of Bayesianism, such as in recommending maximum-entropy models as an epistemological principle rather than as a useful tool. And I think this post by Eliezer (which you discussed) overreaches in a few ways. I still think that “Strong Bayesianism” as you defined it is a strawman, though there is some cluster in thoughtspace that could be called “Strong Bayesianism” that both of us would have disagreements with.
      
      (as an aside, as far as I can tell, the entire $A_{p}$ section of Jaynes’s Probability Theory: The Logic of Science is logically inconsistent)