CyrilDan comments on Am I Understanding Bayes Right?

CyrilDan 15 Nov 2013 7:35 UTC
0 points
0
First of all, let me thank you so much, MrMind, for your post. It was really helpful, and I greatly appreciate how much work you put into it!

I’ll try to give you the formalist perspective, which is a sort of ‘minimal’ take on the whole matter.

Much obliged.

Everything starts with a set of symbols, usually finite, that can be combined to form strings called formulas.

Question. I’m making my way through George Lakoff’s works on metaphor and embodied thought; are familiar with the theory at all? (I know lukeprog did a blog post about them, but it’s not nearly everything there is to know) Basically the theory is that our most basic understandings are linked to our direct sensory experience, and then we abstract away from that metaphorically in various fields, a very bottom-up approach. Whereas what you’re saying is starting with symbols, which I think would be the reverse of what he’s saying? Which probably means that it’s a difference of perspective (it probably is), but as a starting point it gives the concepts less ballast for me. That said, I’m not entirely lost—I think I mentioned that I’ve studied symbolic logic, so I’ll brave ahead!

Then there’s the concept of truth: when you have a logic, you notice that sometimes formulas refer to entities or states of some environment, and that syntactic rules somehow reflect processes happening between those entities. Specifying which environment, which processes and which entities you are considering is the purpose of ontology, while the task of relating ontology and morphology/syntax is the purpose of semantics.

As you can probably imagine, there are a myriad of logics and myriads of ontologies (often called models).

How does this connect to the map-territory distinction? Generally as I’ve understood it, logic is a form of map, but so too would be a model. Would a model be a map and logic be a map of a map? Am I getting that right?

All three of them, frequentists, subjectivists and Bayesians, believe that the structure of probability is correctly described by the mathematical concept of a measure, as formalized by the Kolmogorov axioms.

This is something that has always confused me, the probability definition wars. Is there really something to argue about here? Maybe I’m missing something, but it seems like a “if a tree falls in the woods...” kind of question that should just be taboo’d. But when you taboo frequency-probability off from epistemic-probability, it’s not immediately obvious why the same axioms should apply to both of them (which doesn’t mean that they don’t; thank you to everyone for pointing me to Cox’s Theorems again. I know I’ve seen them before, but I think they’re starting to click a little bit more on this pass-over). And Richard Carrier’s new book said that they’re actually the same thing, which is just confusing (that epistemic probability is the frequency at which beliefs with the same amount of evidence will be true/false, or something like that). (EDIT: Another possibility would be that both frequentist and Bayesian definitions of probability could both be “probability” and both conform to the axioms, but that would just make it more perplexing for people to argue about it)

As you can see, you are just using one ontology (possible worlds) to justify one interpretation (Kolmogorov measure), but there are many more.

Thanks for the terminology. I don’t really understand what they are given so brief a description, but knowing the names at least spurs further research. Also, am I doing it right for the one ontology and one interpretation that I’ve stumbled across, regardless of the others?

Fuzzy logic resembles PTEL in the expansions of the set of truth values, but uses different rules than CL, so the resemblance is only superficial: PTEL and fuzzy logics are two very different beasts.

Right, because in fuzzy logics the spectrum is the truth value (because being hot/cold, near/far, gay/straight, sexual/asexual, etc. is not an either/or), whereas with PTEL the spectrum is the level of certainty in a more staunch true/false dichotomy, right? I don’t actually know fuzzy logic, I just know the premise of it.

The other question I forgot to ask in the first post was how Bayes’ Theorem interacts with group identity not being a matter of necessary and sufficient conditions, or for other fuzzy concepts like I mentioned earlier (near/far, &c.). For this would you just pick a mostly-arbitrary concept boundary so that you have a binary truth value to work with?
- MrMind 15 Nov 2013 10:49 UTC
  1 point
  0
  Parent
  
  I’m making my way through George Lakoff’s works on metaphor and embodied thought; are familiar with the theory at all?
  
  Unfortunately no, but from your description it seems quite like the theory of the mind of General Semantics.
  
  Whereas what you’re saying is starting with symbols, which I think would be the reverse of what he’s saying?
  
  Not exactly, because in the end symbols are just unit of perceptions, all distinct from one another. But while Lakoff’s theory probably aims at psychology, logic is a denotational and computational tool, so it doesn’t really matter if they aren’t perfect inverse.
  
  How does this connect to the map-territory distinction? Generally as I’ve understood it, logic is a form of map, but so too would be a model. Would a model be a map and logic be a map of a map? Am I getting that right?
  
  Yes. Since a group of maps can be seen just as a set of things in itself, it can be treated as a valid territory. In logic there are also map/territory loops, where the formulas itself becomes the territory mapped by the same formulas (akin to talking in English about the English language). This trick is used for example in Goedel’s and Tarski’s theorems.
  
  This is something that has always confused me, the probability definition wars. Is there really something to argue about here?
  
  Yes. Basically the Bayesian definition is more inclusive: e.g. there is no definition of a probability of a single coin toss in the frequency interpretation, but there is in the Bayesian. Also in Bayes take on probability the frequentist definition emerges just as a natural by-product. Plus, the Bayesian framework produced a lot of detangling in frequentist statistics and introduced more powerful methods.
  
  thank you to everyone for pointing me to Cox’s Theorems again. I know I’ve seen them before, but I think they’re starting to click a little bit more on this pass-over
  
  The first two chapters of Jaynes’ book, a pre-print version of which is available online for free, do a great job in explaining and using Cox to derive Bayesian probability. I urge you to read them to fully grasp this point of view.
  
  And Richard Carrier’s new book said that they’re actually the same thing, which is just confusing
  
  And easily falsifiable.
  
  Also, am I doing it right for the one ontology and one interpretation that I’ve stumbled across, regardless of the others?
  
  Yes, but remember that this measure interpretation of probability requires the set of possible world to be measurable, which is a very special condition to impose on a set. It is certainly very intuitive, but technically burdensome. If you plan to work with probability, it’s better to start from a cleaner model.
  
  Right, because in fuzzy logics the spectrum is the truth value (because being hot/cold, near/far, gay/straight, sexual/asexual, etc. is not an either/or), whereas with PTEL the spectrum is the level of certainty in a more staunch true/false dichotomy, right?
  
  Yes. Fuzzy logic has an infinity of truth values for its propositions, while in PTEL every proposition is ‘in reality’ just true or false, you just don’t know which is which, and so you track your certainty with a real number.
  
  The other question I forgot to ask in the first post was how Bayes’ Theorem interacts with group identity not being a matter of necessary and sufficient conditions, or for other fuzzy concepts like I mentioned earlier (near/far, &c.). For this would you just pick a mostly-arbitrary concept boundary so that you have a binary truth value to work with?
  
  Yes, in PTEL you already have real numbers, so it’s not difficult to just say “The tea is 0.7 cold”, and provided you have a clean (that is, classical) interpretation for this, the sentence is just true or false. Then you can quantify you uncertainty: “I give 0.2 credence to the belief that the tea is 0.7 cold”. More generally, “I give y credence to the belief that the tea is x cold”.
  What comes out is a probability distribution, that is the assignment of a probability value to every value of a parameter (in this case, the coldness of tea). Notice that this would be impossible in the frequentist interpretation.
  - CyrilDan 15 Nov 2013 19:13 UTC
    0 points
    0
    Parent
    Unfortunately no, but from your description it seems quite like the theory of the mind of General Semantics.
    
    I think it’s similar, but Lakoff focuses more on how things are abstracted away. For example, because in childhood affection is usually associated with warmth (e.g. through hugs), the different areas of your brain that code for those things become linked (“neurons that wire together, fire together”). This then becomes the basis of a cognitive metaphor, Affection Is Warmth, such that we can also say “She has a warm smile” or “He gave me the cold shoulder” even though we’re not talking literally about body temperature.
    
    Similarly, in Where Mathematics Comes From: How The Embodied Mind Brings Mathematics Into Being, he summarises his chapter “Boole’s Metaphor: Classes and Symbolic Logic” thusly:
    
    There is evidence … that Container schemas are grounded in the sensory-motor system of the brain, and that they have inferential structures like those just discussed. These include Container schema versions of the four inferential laws of classical logic.
    We know … that conceptual metaphors are cognitive cross-domain mappings that preserve inferential structure.
    … [W]e know that there is a Classes are Containers metaphor. This grounds our understanding of classes, by mapping the inferential structure of embodied Container schemas to classes as we understand them.
    Boole’s metaphor and the Propositional Logic metaphor have been carefully crafted by mathematicians to mathematicize classes and map them onto propositional structures.
    The symbolic-logic mapping was also crafted by mathematicians, so that propositional logic could be made into a symbolic calculus governed by “blind” rules of symbol manipulation.
    Thus, our understanding of symbolic logic traces back via metaphorical and symbolic mappings to the inferential structure of embodied Container schemas.
    
    That’s what I was getting at above, but I’m not sure I explained it very well. I’m less eloquent than Mr. Lakoff is, I think.
    
    Yes. Since a group of maps can be seen just as a set of things in itself, it can be treated as a valid territory. In logic there are also map/territory loops, where the formulas itself becomes the territory mapped by the same formulas (akin to talking in English about the English language). This trick is used for example in Goedel’s and Tarski’s theorems.
    
    Hmm interesting. I should become more familiar with those.
    
    Yes. Basically the Bayesian definition is more inclusive: e.g. there is no definition of a probability of a single coin toss in the frequency interpretation, but there is in the Bayesian. Also in Bayes take on probability the frequentist definition emerges just as a natural by-product. Plus, the Bayesian framework produced a lot of detangling in frequentist statistics and introduced more powerful methods.
    
    Oh right for sure, another historical example would be “What’s the probability of a nuclear reactor melting down?” before any nuclear reactors had melted down. But I mean, even if the Bayesian definition covers more than the frequentist definition (which it definitely does), why not just use both definitions and understand that one application is a subset of the other application?
    
    The first two chapters of Jaynes’ book, a pre-print version of which is available online for free, do a great job in explaining and using Cox to derive Bayesian probability. I urge you to read them to fully grasp this point of view.
    
    Right, I think I found the whole thing online, actually. And the first chapter I understood pretty much without difficulty, but the second chapter gave me brainhurt, so I put it down for a while. I think it might be that I never took calculus in school? (something I now regret, oddly enough for the general population) So I’m trying to becoming stronger before I go back to it. Do you think that getting acquainted with Cox’s Theorem in general would make Jayne’s particular presentation of it easier to digest?
    
    Yes...
    
    Yes...
    
    Yes...
    
    Hooray, I understand some things!
    - MrMind 18 Nov 2013 10:36 UTC
      0 points
      0
      Parent
      
      But I mean, even if the Bayesian definition covers more than the frequentist definition (which it definitely does), why not just use both definitions and understand that one application is a subset of the other application?
      
      You’ll have to ask to a frequentist :)
      Bayesian use both definition (even though they call long-run frequency… well, long-run frequency), but frequentist refuse to acknowledge bayesian probability definition and methods.
      
      but the second chapter gave me brainhurt, so I put it down for a while. I think it might be that I never took calculus in school? (something I now regret, oddly enough for the general population) So I’m trying to becoming stronger before I go back to it. Do you think that getting acquainted with Cox’s Theorem in general would make Jayne’s particular presentation of it easier to digest?
      
      I skipped the whole derivation too, it was not interesting. What is important is at the end of the chapter, that is that developing Cox requirements brings to the product and the negation rules, and that’s all you need.