Morendil comments on Book Club Update and Chapter 1

Morendil 15 Jun 2010 0:32 UTC
3 points
Questions for the first part of Chapter 1:
- Compare Jaynes’ framing of probability theory with your previous conceptions of “probability”. What are the differences?
- What do you make of Jaynes’ observation that plausible inference is concerned with logical connections, and must be carefully distinguished from physical causation?
(If you can think of other/better questions, please ask away!)
- Christian_Szegedy 15 Jun 2010 19:07 UTC
  8 points
  Parent
  Speaking of Chapter 1, it seems essential to point out another point that may be unclear on superficial reading.
  
  The author introduces the notion of a reasoning “robot” that maintains a consistent set of “plausibility” values (probabilities) according to a small set of rules.
  
  To a modern reader, it may make the impression that the author here suggests some practical algorithm or implementation of some artificial intelligence that uses Bayesian inference as a reasoning process.
  
  I think, this misses the point completely. First: it is clear that maintaining such a system of probability values even for a set of simply Boolean formulas (consistently!) amounts to solving SAT problems and therefore computationally infeasible in general.
  
  Rather, the author’s purpose of introducing the “robot” was to avoid the misconception that plausibility desiderata are some subjective, inaccurate notions that depend on some hidden features of the human mind. So by detaching the inference rule from the human mind and using a idealized “robot”, the author wants to argue that these axioms and their consequences can and should be studied mathematically and independently from all other features and aspects of human thinking and rationality.
  
  So here the objective was not to build some intelligence, rather study an abstract and computationally unconstrained version of intelligence obeying the above principles alone.
  
  Such an AI will never be realized in practice (due to inherent complexity limitations, and here I don’t just speak about P!=NP !), Still, if we can prove what this theoretical AI will have to do in certain specific situations, then we can learn important lessons about the above principles, or even guide our decisions by that insights we gained from that study.
  What links here?
  - Book Club Update and Chapter 1 by Morendil (15 Jun 2010 0:30 UTC; 20 points)
  - JenniferRM's comment on Book Club Update and Chapter 1 by Morendil (17 Jun 2010 2:03 UTC; 6 points)
  - xamdam 17 Jun 2010 16:55 UTC
    2 points
    Parent
    I agree that Jaynes is using the robot as a literary device to get a point across.
    
    If I understood you correctly it seems you’re sneaking an additional claim that a Bayesian AI is theoretically impossible due to computational concerns. That should be discussed separately, but the obvious counterargument is that while, say, complete inference in Bayes Nets has been proved intractable, approximate inference does well on good-size problems, and approximate does not mean it’s not Bayesian.
    - Christian_Szegedy 17 Jun 2010 18:36 UTC
      2 points
      Parent
      Sorry, I never tried to imply that an AI built on the Bayesian principles is impossible or even a bad idea. (Probably, using Bayesian inference is a fundamentally good idea.)
      
      I just tried to point out that easy looking principles don’t necessarily translate to practical implementations in a straightforward manner.
  - Morendil 22 Jun 2010 8:01 UTC
    0 points
    Parent
    What then do you make of Jayne’s observation in the Comments: “Our present model of the robot is quite literally real, because today it is almost universally true that any nontrivial probability evaluation is performed by a computer”?
    - Christian_Szegedy 22 Jun 2010 22:31 UTC
      0 points
      Parent
      In my reading it means, that there are already actual implementations for all probability inference operations that the authors consider in the book.
      
      This has been probably a true statement, even in the 60′ies. It does not mean that the robot as a whole is resource-wise feasible.
      
      An analogy: It is not hard to implement all (non-probabilistic) logical derivation rules. It is also straightforward to use them to generate all true mathematical theorems (e.g. within ZFC). However this does not imply that we have an practical (i.e. efficient) general purpose mathematical theorem-prover. It gives an algorithm to prove every provable theorems eventually, but its run-time consumption makes this approach practically useless.
  - Jonathan_Graehl 18 Jun 2010 1:35 UTC
    0 points
    Parent
    
    amounts to solving SAT problems
    
    I assume you mean in the sense that deciding satisfiability of arbitrary propositions (over uncertain variables; certainly true/false ones can be simplified out) is NP-complete. Of course I mean that a variable v is uncertain if 0<p(v)<1.
    - Christian_Szegedy 18 Jun 2010 6:45 UTC
      0 points
      Parent
      Actually, solving SAT problems is just the simplest case. Even so, if you have only certain variables (with either 0 or 1 plausibility), it’s still NP-complete, you can’t just simplify them in polynomial time. [EDIT: This is wrong as Jonathan pointed it out.]
      
      In extreme case, since we also have the rule that “robot” has to use all the available information to the fullest extent, it means that the “robot” must be insanely powerful. For example if the calculation of some plausibility value depends for example the correctness of an algorithm (known by the “robot”, with a very high probability), then it will have to be able to solve the halting problem in general.
      
      Even if you constrain your probability values to be never certain or impossible, you can always chose small (or large) enough values, so that the computation of the probabilities can be used to solve the discrete version of the problem.
      
      For example, in the simplest case: if you just have a set of propositions in (let us say in conjunctive normal form), the consistency desideratum implies the ability of the “robot” to solve SAT problems, even if the starting plausibility values for the literals fall into the open (0,1) interval.
      - Jonathan_Graehl 18 Jun 2010 21:30 UTC
        0 points
        Parent
        I think you misunderstood. The robot has a real number p(v) for every v. Let’s grant an absolute min and max of 0 and 1. My point was simply that when p(v)=0 or p(v)=1, v can be simplified out of propositions using it.
        
        I understand why computing the probability of a proposition implies answering whether it’s satisfiable.
        Christian_Szegedy 18 Jun 2010 23:40 UTC
        1 point
        Parent
        Sorry for the confusion. I was very superficial. Of course, your are correct about being able to simplify out those values.
- Zachary_Kurtz 15 Jun 2010 19:01 UTC
  4 points
  Parent
  I never thought about the connection between logic and probability before, though now it seems obvious. I’ve read a few introductory logic texts and deductive reasoning always seemed a bit pointless to me (in RL premises are usually inferred from something). -
  
  To draw from a literary example, Sherlock Holmes use of the phrase “deduce” always seemed a bit deceptive. You can say “that color of dirt exists only in spot x in London. Therefore, that Londoner must have come in contact with spot x if I see that dirt on his trouser knee.” This is presented as a deduction, but really, the premises are induced and he assumes some things about how people travel.
  
  It seems more likely that we make inferences, not deductions, but convince ourselves that the premises must be true, without bothering to put real information about likelihood into the reasoning. An induction is still a logical statement, but I like the idea of using probability to quantify it.
  What links here?
  - Book Club Update and Chapter 1 by Morendil (15 Jun 2010 0:30 UTC; 20 points)
  - Jayson_Virissimo 16 Jun 2010 2:38 UTC
    5 points
    Parent
    As far as I can tell, Holmes actually engages in, what Charles Sanders Peirce called, “abduction”. It is neither deduction nor induction.
    - AlephNeil 16 Jun 2010 5:35 UTC
      0 points
      Parent
      I agree that Holmes is neither deducing nor “inducing”, but I don’t like this concept of “abductive inference”.
      
      It’s obvious that what we’re after is the best explanation of the data we’ve collected, so it’s never wrong to attempt to find the best explanation, but as advice, or as a description of how a rational agent proceeds, it’s as useless as the advice to win a game of football by scoring more goals than the opposing side.
      - JenniferRM 17 Jun 2010 2:03 UTC
        6 points
        Parent
        Perhaps yes… but… I have found over time that paying attention to interesting but weird features of a domain leads interesting places. The tractability problems inherent to some computational models of bayesian reasoning makes me suspect that “something else” is being used as “the best that can be physically realized for now” to do whatever it is that brains do. When evolutionary processes produce a result it generally utilizes principles that are shockingly beautiful and simple once you see them.
        
        I had not previously heard of the term “abductive reasoning” but catching terms like this is one of the reasons I love this community. The term appears to connect with something I was in a discussion about called “cogent confabulation”. (Thanks for the heads up, Jayson!)
        
        The obvious thing that jumps out is that what Hecht-Neilson called “cogency” is strikingly similar to both Jayne’s police example and the example of Sherlock Holmes. I’m tempted to speculate that the same “architectural quirk” in human brains that supports this (whatever it turns out to be) may also be responsible (on the downside) for both the Prosecutor’s Fallacy and our notoriously crappy performance with Modus Tollens.
        
        Given the inferential distance between me and the many handed horror, this makes me think there is something clever to be said for whatever that quirk turns out to be. Maybe storing your “cause given evidence” conditional probabilities and your causal base rates all packed into a single number is useful for some reason? If I were to abduct a reason, it would be managing “salience” when trying to implement a practically focused behavior generating system that has historically been strongly resource limited. Its just a guess until I see evidence one way or the other… but that would be my “working hunch” until then :-)
- taiyo 17 Jun 2010 5:58 UTC
  3 points
  Parent
  Along with the distinction between causal and logical connections, when considering the conditional premise of the syllogisms (if A then B), Jaynes warns us to distinguish between those conditional statements of a purely formal character (the material conditional ) and those which assert a logical connection.
  
  It seems to me that the weak syllogisms only “do work” when the conditional premise is true due to a logical connection between antecedent and consequent. If no such connection exists, or rather, if our mind cannot establish such a connection, then the plausibility of the antecedent doesn’t change upon learning the consequent.
  
  For example, “if the garbage can is green then frogs are amphibians” is true since frogs are amphibians, but this fact about frogs does not increase (or decrease) the probability that the garbage can is green since presumably, most of us don’t see a connection between the two propositions.
  
  At some point in learning logic, I think I kind of lost touch with the common language use of conditionals as asserting connections. I like that Jaynes reminds us of the distinction.
- Craig_Heldreth 16 Jun 2010 18:17 UTC
  2 points
  Parent
  
  What do you make of Jaynes’ observation that plausible inference is concerned with logical connections, and must be carefully distinguished from physical causation?
  
  His example of the rain at 10:30 implying clouds at 10:15 with any physical causation going in the other direction is clear. And I appreciate his polemic that limiting yourself to reasoning based upon physical cause and effect is dull and impractical. He was a physicist and the ideal of physicists is to discover previously unknown natural laws of cause and effect, which made him a bit eccentric within his own community and so we get the tone in there of pleading. It is a minor distraction in the midst of great material.
  
  57 participants should make for a sustained critical mass even with heavy attrition.
- cata 16 Jun 2010 5:47 UTC
  2 points
  Parent
  It occurs to me that Jaynes is missing a desideratum that I might have included. I can’t decide if it’s completely trivial, or if perhaps it’s covered implicitly in his consistency rule 3c; I expect it will become clear as the discussion becomes more formal—and of course, he did promise that the rules given would turn out to be sufficient. To wit:
  - The robot should not assign plausibilities arbitrarily. If the robot has plausibilities for propositions A and B such that the plausibility of A is independent of the plausibility of B, and the plausibility of A is updated, then the degree of plausibility for B should remain constant barring other updates.
  One more thing. The footnote on page 12 wonders: Does it follow that AND and NOT (or NAND alone) are sufficient to write any computer program?
  
  Isn’t this trivial? Since AND and NOT can together be composed to represent any logic function, and a logic function can be interpreted as a function from some number of bits (the truth values of the variable propositions) to one result bit, it follows that we can write programs with AND and NOT that make any bits in our computer an arbitrary function of any of the other bits. Is there some complication I’m missing?
  
  (Edited slightly for clarity.)
  What links here?
  - Morendil's comment on Book Club Update and Chapter 1 by Morendil (22 Jun 2010 7:37 UTC; 1 point)
  - ata 16 Jun 2010 8:14 UTC
    6 points
    Parent
    
    Isn’t this trivial? Since AND and NOT can together be composed to represent any logic function, and a logic function can be interpreted as a function from some number of bits (the truth values of the variable propositions) to one result bit, it follows that we can write programs with AND and NOT that make any bits in our computer an arbitrary function of any of the other bits. Is there some complication I’m missing?
    
    You can use NAND to implement any algorithm that has a finite upper time bound, but not “any computer program”, since a logical formula can’t express recursion.
    - gimpf 16 Jun 2010 15:04 UTC
      0 points
      Parent
      Does that mean that digital-eletronic NANDs which could be used to build flip-flops, registers, etc. cannot be expressed in a logical formula?
      - pengvado 16 Jun 2010 16:18 UTC
        2 points
        Parent
        Electronic NAND gates have a nonzero time delay. This allows you to connect them in cyclic graphs to implement loops.
        
        You can model such a circuit using a set of logical fomulae that has one logical NAND per gate per timestep. Ata pointed out that you need an infinitely large set of logical formulae if you want to model an arbitrarily long computation this way. Though you can compress it back down to a finite description if you’re willing to extend the notation a bit, so you might not consider that a problem.
    - cata 16 Jun 2010 13:28 UTC
      0 points
      Parent
      I agree that you are correct. Thank you.
  - Morendil 16 Jun 2010 7:26 UTC
    2 points
    Parent
    
    proposition A is updated such that (B|A) = (B|A’), then the degree of plausibility for B should remain constant.
    
    Not sure I see what you mean. Do you have an example?
    - cata 16 Jun 2010 22:14 UTC
      2 points
      Parent
      I think I was unclear. Here’s what I mean:
      
      Suppose our robot takes these two propositions:
      
      A = “It’s going to rain tonight in Michigan.” B = “England will win the World Cup.”
      
      And suppose it thinks that the plausibility of A is 40, and the plausibility of B is 25.
      
      As far as our robot knows, these propositions are not related. That is, in Jaynes’ notation (I’ll use a bang for “not,”) (A|B) = (A|!B) = 40, and (B|A) = (B|!A) = 25. Is that correct?
      
      Now suppose that the plausibility of A jumps to 80, because it’s looking very cloudy this afternoon. I suggest that the plausibility of B should remain unchanged. I’m not sure whether the current set of rules is sufficient to ensure that, although I suspect it is. I think it might be impossible to come up with a consistent system breaking this rule that still obeys the (3c) “consistency over equivalent problems” rule.
      - Morendil 17 Jun 2010 6:16 UTC
        3 points
        Parent
        If you know from the outset that these propositions are unrelated, you already know something quite important about the logical structure of the world that these propositions describe.
        
        Jaynes comes back to this point over and over again, and it’s also a major theme of the early chapters in Pearl’s Causality:
        
        Probabilistic relationships, such as marginal and conditional independencies, maybe helpful in hypothesizing initial causal structures from uncontrolled observations. However, once knowledge is cast in causal structure, those probabilistic relationships tend to be forgotten; whatever judgements people express about conditional independencies in a given domain are derived from the causal structure acquired. This explains why people feel confident asserting certain conditional independencies (e.g., that the price of beans in China is independent of the traffic in Los Angeles) having no idea whatsoever about the numerical probabilities involved (e.g., whether the price of beans will exceed $10 per bushel).
        
        -- Pearl, Causality p. 25
      - magfrump 17 Jun 2010 9:58 UTC
        1 point
        Parent
        The way that you phrase this, “suppose the plausibility of A jumps to 80,” has no rigor. Depending on the way you choose to calculate this, it could lead to change in B or not.
        
        if we consider them independent, we could imagine 100 different worlds, and we would expect in 40 of these worlds A to be true, etc., which would leave us with:
        
        10 worlds where AB is true 30 worlds where A(!B) is true 15 worlds where (!A)B is true 45 worlds where (!A)(!B) is true
        
        In general I would expect evidence to come in the form of determining that we are not in a certain world. If we determine that the probability of A rises, because we know ourselves not to be in any world where (!A)(!B) is true, then we would have to adjust the probability of B.
        
        Your given reason, “because it’s looking very cloudy this afternoon.” Would probably indicate that we are uniformly less likely to be in any given world where A is false. In this case, the plausibility of A should jump without effecting the plausibility of B.
        
        So what I’m really saying is that there is no sense in which statements are independent, only a sense in which evidence is independent of statements.
        
        However, a lot of this is speculation since it really isn’t addressed directly in the first chapter, as Christian points out.
      - Christian_Szegedy 16 Jun 2010 22:44 UTC
        1 point
        Parent
        I think it is impossible to decide this based on Chapter 1 alone, for the second criterion (qualitative correspondence with common sense) is not yet specified formally.
        
        If you look into Chapter 2, the derivation of the product rule, he uses this rubber-assumption to get the results he aims for (very similarly to you).
        
        I think one should not take some statements of the author like (”… our search for desiderata is at an end… ”) too seriously.
        
        In some sense this informative approach is defensible, from another perspective it definitely looks quite pretentious.
  - Christian_Szegedy 16 Jun 2010 18:24 UTC
    0 points
    Parent
    I don’t understand what you mean by “(B|A) = (B|A’)”.
- simplicio 16 Jun 2010 18:38 UTC
  1 point
  Parent
  Compare Jaynes’ framing of probability theory with your previous conceptions of “probability”. What are the differences?
  Probability is a 1-to-1 mapping of plausibilities onto real numbers, as opposed to an objective thing waiting to be discovered, mind-independently.
  What do you make of Jaynes’ observation that plausible inference is concerned with logical connections, and must be carefully distinguished from physical causation?
  It seems quite reasonable. His storm cloud analogy works quite well.
  
  I was particularly impressed with the “Comments” section after the chapter.