JQuinton comments on Rationality Quotes December 2014

JQuinton 16 Dec 2014 2:32 UTC
17 points
“They, instead, commit the fundamental attribution error, which is if something good happens, it’s because I’m a genius. If something bad happens, it’s because someone’s an idiot or I didn’t get the resources or the market moved. … What we’ve seen is that the people who are the most successful here, who we want to hire, will have a fierce position. They’ll argue like hell. They’ll be zealots about their point of view. But then you say, ‘here’s a new fact,’ and they’ll go, ‘Oh, well, that changes things; you’re right.’”
- Why Google doesn’t care about hiring top graduates
- Jiro 16 Dec 2014 19:01 UTC
  6 points
  Parent
  Wouldn’t something good happening correctly result in a Bayseian update on the probability that you are a genius, and something bad a Bayseian update on the probability that someone is an idiot? (perhaps even you)
  What links here?
  - dxu's comment on Rationality Quotes December 2014 by Salemicus (17 Dec 2014 1:14 UTC; 6 points)
  - DanielLC 17 Dec 2014 3:07 UTC
    4 points
    Parent
    Yes, but if something good happens you have to update on the probability that someone besides you is a genius, and if something bad happens you have to update on the probability that you’re the idiot. The problem is people only update the parts that make them look better.
  - Vaniver 16 Dec 2014 20:22 UTC
    2 points
    Parent
    Yes, but the issue is whether or not those are the dominant hypotheses that come to mind. It’s better to see success and failure as results of plans and facts than innate ability or disability.
  - Lumifer 16 Dec 2014 20:20 UTC
    1 point
    Parent
    Not without a causal link, the absence of which is conspicuous.
    - dxu 16 Dec 2014 20:40 UTC
      7 points
      Parent
      Not necessarily. Causation might not be present, true, but causation is not necessary for correlation, and statistical correlation is what Bayes is all about. Correlation often implies causation, and even when it doesn’t, it should still be respected as a real statistical phenomenon. All Jiro’s update would require is that P(success|genius) > P(success|~genius), which I don’t think is too hard to grant. It might not update enough to make the hypothesis the dominant hypothesis, true, but the update definitely occurs.
      - IlyaShpitser 16 Dec 2014 20:48 UTC
        4 points
        Parent
        “Because” (in the original quote) is about causality. Your inequality implies nothing causal without a lot of assumptions. I don’t understand what your setup is for increasing belief about a causal link based on an observed correlation (not saying it is impossible, but I think it would be helpful to be precise here).
        
        Jiro’s comment is correct but a non-sequitur because he was (correctly) pointing out there is a dependence between success and genius that you can exploit to update. But that is not what the original quote was talking about at all, it was talking about an incorrect, self-serving assignment of a causal link in a complicated situation.
        dxu 17 Dec 2014 1:14 UTC
        6 points
        Parent
        
        “Because” (in the original quote) is about causality. Your inequality implies nothing causal without a lot of assumptions.
        
        Yes, naturally. I suppose I should have made myself a little clearer there; I was not making any reference to the original quote, but rather to Jiro’s comment, which makes no mention of causation, only Bayesian updates.
        
        I don’t understand what your setup is for increasing belief about a causal link based on an observed correlation (not saying it is impossible, but I think it would be helpful to be precise here).
        
        Because P(causation|correlation) > P(causation|~correlation). That is, it’s more likely that a causal link exists if you see a correlation than if you don’t see a correlation.
        
        As for your second paragraph, Jiro himself/herself has come to clarify, so I don’t think it’s necessary (for me) to continue that particular discussion.
        Richard_Kennaway 17 Dec 2014 8:17 UTC
        4 points
        Parent
        
        Because P(causation|correlation) > P(causation|~correlation). That is, it’s more likely that a causal link exists if you see a correlation than if you don’t see a correlation.
        
        Where are you getting this? What are the numerical values of those probabilities?
        
        You can have presence or absence of a correlation between A and B, coexisting with presence or absence of a causal arrow between A and B. All four combinations occur in ordinary, everyday phenomena.
        
        I cannot see how to define, let alone measure, probabilities P(causation|correlation) and P(causation|~correlation) over all possible phenomena.
        
        I also don’t know what distinction you intend in other comments in this thread between “correlation” and “real correlation”. This is what I understand by “correlation”, and there is nothing I would contrast with this and call “real correlation”.
        dxu 17 Dec 2014 16:14 UTC
        6 points
        Parent
        
        You can have presence or absence of a correlation between A and B, coexisting with presence or absence of a causal arrow between A and B. All four combinations occur in ordinary, everyday phenomena.
        
        Do you think it is literally equally likely that causation exists if you observe a correlation, and if you don’t? That observing the presence or absence of a correlation should not change your probability estimate of a causal link at all? If not, then you acknowledge that P(causation|correlation) != P(causation|~correlation). Then it’s just a question of which probability is greater. I assert that, intuitively, the former seems likely to be greater.
        
        I also don’t know what distinction you intend in other comments in this thread between “correlation” and “real correlation”. This is what I understand by “correlation”, and there is nothing I would contrast with this and call “real correlation”.
        
        By “real correlation” I mean a correlation that is not simply an artifact of your statistical analysis, but is actually “present in the data”, so to speak. Let me know if you still find this unclear. (For some examples of “unreal” correlations, take a look here.)
        Richard_Kennaway 18 Dec 2014 16:47 UTC
        4 points
        Parent
        
        Do you think it is literally equally likely that causation exists if you observe a correlation, and if you don’t?
        
        I think I have no way of assigning numbers to the quantities P(causation|correlation) and P(causation|~correlation) assessed over all examples of pairs of variables. If you do, tell me what numbers you get.
        
        I assert that, intuitively, the former seems likely to be greater.
        
        I asked why and you have said “intuition”, which means that you don’t know why.
        
        My belief is different, but I also know why I hold it. Leaping from correlation to causation is never justified without reasons other than the correlation itself, reasons specific to the particular quantities being studied. Examples such as the one you just linked to illustrate why. There is no end of correlations that exist without a causal arrow between the two quantities. Merely observing a correlation tells you nothing about whether such an arrow exists. For what it’s worth, I believe that is in accordance with the views of statisticians generally. If you want to overturn basic knowledge in statistics, you will need a lot more than a pronouncement of your intuition.
        
        By “real correlation” I mean a correlation that is not simply an artifact of your statistical analysis, but is actually “present in the data”, so to speak.
        
        A correlation (or any other measure of statistical dependence) is something computed from the data. There is no such thing as a correlation not “present in the data”.
        
        What I think you mean by a “real correlation” seems to be an actual causal link, but that reduces your claim that “real correlation” implies causation to a tautology. What observations would you undertake to determine whether a correlation is, in your terms, a “real” correlation?
        dxu 18 Dec 2014 21:31 UTC
        4 points
        Parent
        
        I think I have no way of assigning numbers to the quantities P(causation|correlation) and P(causation|~correlation) assessed over all examples of pairs of variables. If you do, tell me what numbers you get.
        
        My original question was whether you think the probabilities are equal. This reply does not appear to address that question. Even if you have no way of assigning numbers, that does not imply that the three possibilities (>, =, <) are equally likely. Let’s say we somehow did find those probabilities. Would you be willing to say, right now, that they would turn out to be equal (with high probability)?
        
        I asked why and you have said “intuition”, which means that you don’t know why.
        
        Okay, here’s my reasoning (which I thought was intuitively obvious, hence the talk of “intuition”, but illusion of transparency, I guess):
        
        The presence of a correlation between two variables means (among other things) that those two variables are statistically dependent. There are many ways for variables to be dependent, one of which is causation. When you observe that a correlation is present, you are effectively eliminating the possibility that the variables are independent. With this possibility gone, the remaining possibilities must increase in probability mass, i.e. become more likely, if we still want the total to sum to 1. This includes the possibility of causation. Thus, the probability of some causal link existing is higher after we observe a correlation than before: P(causation|correlation) > P(causation|~correlation).
        
        There is no such thing as a correlation not “present in the data”.
        
        If you are using a flawed or unsuitable analysis method, it is very possible for you to (seemingly) get a correlation when in fact no such correlation exists. An example of such a flawed method may be found here, where a correlation is found between ratios of quantities despite those quantities being statistically independent, thus giving the false impression that a correlation is present when it is actually not.
        
        What observations would you undertake to determine whether a correlation is, in your terms, a “real” correlation?
        
        As I suggested in my reply to Lumifer, redundancy helps.
        Richard_Kennaway 31 Dec 2014 12:48 UTC
        1 point
        Parent
        Sorry it’s taken me so long to get back to this.
        
        Okay, here’s my reasoning (which I thought was intuitively obvious, hence the talk of “intuition”, but illusion of transparency, I guess):
        
        The illusion of transparency applies not only to explaining things to other people, but to explaining things to oneself.
        
        The presence of a correlation between two variables means (among other things) that those two variables are statistically dependent. There are many ways for variables to be dependent, one of which is causation. When you observe that a correlation is present, you are effectively eliminating the possibility that the variables are independent. With this possibility gone, the remaining possibilities must increase in probability mass, i.e. become more likely, if we still want the total to sum to 1. This includes the possibility of causation. Thus, the probability of some causal link existing is higher after we observe a correlation than before: P(causation|correlation) > P(causation|~correlation).
        
        The argument still does not work. Statistical independence does not imply causal independence. In causal reasoning the idea that it does is called the assumption or axiom of faithfulness, and there are at least two reasons why it may fail. Firstly, the finiteness of sample sizes mean that observations can never prove statistical independence, only put likely upper bounds on its magnitude. As Andrew Gelman has put it, with enough data, nothing in independent. Secondly, dynamical systems and systems of cyclic causation are capable of producing robust statistical independence of variables that are directly causally related. There may be reasons for expecting faithfulness to hold in a specific situation, but it cannot be regarded as a physical law true always and everywhere.
        
        Even when faithfulness does hold, statistical dependence tells you only that either causation or selection is happening somewhere. If your observations are selected on a common effect of the two variables, you may observe correlation when the variables are causally independent. If you have reason to think that selection is absent, you still have to decide whether you are looking at one variable causing the other, both being effects of common causes, or a combination.
        
        Given all of these complications, which in a real application of statistics you would have to have thought about before even collecting any data, the argument that correlation is evidence for causation, in the absence of any other information about the variables, has no role to play. The supposed conclusion that P(causation|correlation) > P(causation|~correlation) is useless unless there is reason to think that the difference in probabilities is substantial, which is something you have not addressed, and which would require coming up with something like actual values for the probabilities.
        
        Redundancy helps. Use multiple analysis methods, show someone else your results, etc. If everything turns out the way it’s supposed to, then that’s strong evidence that the correlation is “real”.
        
        This is too vague to be helpful. What multiple analysis methods? The correlation coefficient simply is what it is. There are other statistics you can calculate for statistical dependency in general, but they are subject to the same problem as correlation: none of them imply causation. What does showing someone else your results accomplish? What are you expecting them to do that you did not? What is “the way everything is supposed to turn out”?
        
        What, in concrete terms, would you do to determine the causal efficacy of a medication? You won’t get anywhere trying to publish results with no better argument than “correlation raises the probability of causation”.
        Lumifer 17 Dec 2014 16:37 UTC
        0 points
        Parent
        
        a correlation that is not simply an artifact of your statistical analysis, but is actually “present in the data”, so to speak.
        
        How will you be able to distinguish between the two?
        
        You also seem to be using the word “correlation” to mean “any kind of relationship or dependency” which is not what it normally means.
        dxu 17 Dec 2014 16:42 UTC
        4 points
        Parent
        Redundancy helps. Use multiple analysis methods, show someone else your results, etc. If everything turns out the way it’s supposed to, then that’s strong evidence that the correlation is “real”.
        
        EDIT: It appears I’ve been ninja’d. Yes, I am not using the term “correlation” in the technical sense, but in the colloquial sense of “any dependency”. Sorry if that’s been making things unclear.
        What links here?
        dxu's comment on Rationality Quotes December 2014 by Salemicus (18 Dec 2014 21:31 UTC; 4 points)
        Lumifer 17 Dec 2014 19:31 UTC
        0 points
        Parent
        I still don’t understand in which sense do you use the word “real” in ‘correlation is “real”’.
        
        Let’s say you have two time series 100 data points in length each. You calculate their correlation, say, Pearson’s correlation. It’s a number. In which sense can that number be “real” or “not real”?
        
        Do you implicitly have in mind the sampling theory where what you observe is a sample estimate and what’s “real” is the true parameter of the unobserved underlying process? In this case there is a very large body of research that mostly goes by the name of “frequentist statistics” about figuring out what does your sample estimate tell you about the unobserved true value (to call which “real” is a bit of stretch since normally it is not real).
        Expand this thread
        dxu 18 Dec 2014 21:42 UTC
        4 points
        Parent
        It seems as though my attempts to define my term intensionally aren’t working, so I’ll try and give an extensional definition instead:
        
        An example would be that site you linked earlier. Those quantities appear to be correlated, but the correlations are not “real”.
        Lumifer 18 Dec 2014 21:50 UTC
        −1 points
        Parent
        So you are using “real” in the sense of “matching my current ideas of what’s likely”. I think this approach is likely to… lead to problems.
        dxu 18 Dec 2014 21:55 UTC
        6 points
        Parent
        Er… no. Okay, look, here’s the definition I provided from an earlier comment:
        
        By “real correlation” I mean a correlation that is not simply an artifact of your statistical analysis, but is actually “present in the data”, so to speak.
        
        You seemed to understand this well enough to engage with it, even going so far as to ask me how I would distinguish between the two (answer: redundancy), but now you’re saying that I’m using “real” to mean “matching my current ideas of what’s likely”? If there’s something in the quote that you don’t understand, please feel free to ask, but right now I’m feeling a bit bewildered by the fact that you seem to have entirely forgotten that definition.
        
        See also: spurious correlation.
        Lumifer 18 Dec 2014 22:26 UTC
        −1 points
        Parent
        Sigh.
        
        All measured correlations are “actually present in the data”. If you take two data series and calculate their correlation it would be a number. This measured (or sample) correlation is certainly real and not fake. The question is what does it represent.
        
        You claim the ability to decide—on a completely unclear to me basis—that sometimes this measured correlation represents something (and then you call it “real”) and sometimes it represents nothing (and then you call it “not real”). “Redundancy” is not an adequate answer because all it means is that you will re-measure your sample again and, not surprisingly, will get similar results because it’s still the same data. As an example of “not real” correlation you offered the graphs from the linked page, but I see no reason for you to declare them “not real” other than because it does not look likely to you.
        dxu 18 Dec 2014 22:53 UTC
        2 points
        Parent
        
        All measured correlations are “actually present in the data”. If you take two data series and calculate their correlation it would be a number. This measured (or sample) correlation is certainly real and not fake. The question is what does it represent.
        
        Depending on which statistical method you use, the number you calculate may not be the number you’re looking for, or the number you’d have gotten had you used some other method. If you don’t like my use of the word “real” to denote this, feel free to substitute some other word—”representative”, maybe. By “redundancy” I’m not referring to the act of analyzing the data multiple times; I’m referring to using multiple methods to do so and seeing if you get the same result each time (possibly checking with a friend or two in the process).
        
        As an example of “not real” correlation you offered the graphs from the linked page, but I see no reason for you to declare them “not real” other than because it does not look likely to you.
        
        No, I am declaring them “not real” because they were calculated using a statistical method widely regarded as suspect. This suspect method is known to produce correlations that are called “spurious”, and my link in the grandparent comment was to this method’s Wikipedia page. I’m not sure if you thought the link I provided led to the original page you linked, but as you made no mention of “spurious correlations” (the method, not the page), I thought I’d mention it again.
        Jiro 16 Dec 2014 22:50 UTC
        0 points
        Parent
        The quote about causality is a characterization of an opponent’s view. I was suggesting that the quote’s author may have mischaracterized his opponent’s view by interpreting a Bayseian update as an assertion of causality.
        What links here?
        dxu's comment on Rationality Quotes December 2014 by Salemicus (17 Dec 2014 1:14 UTC; 6 points)
      - Lumifer 16 Dec 2014 20:53 UTC
        −4 points
        Parent
        
        statistical correlation is what Bayes is all about
        
        No, I don’t think so at all.
        
        Bayes is about updating your estimates on the basis of new data points. You are not required to be stupid about it.
        dxu 17 Dec 2014 1:04 UTC
        4 points
        Parent
        At a cursory glance, that site you linked does not appear to give any information on how it’s generating those correlations, but the term “spurious correlation” actually has a specific meaning. Essentially, one can make even statistically uncorrelated variables appear to be correlated by introducing a third variable and taking the respective ratios and finding those to be correlated instead. It should go without saying that you should make sure your correlations are actual correlations rather than mere artifacts of your analysis method. As it is, the first thing I’d do is question the validity of those correlations.
        
        However, if the correlations actually are real, then I’d argue that they actually do constitute Bayesian evidence. The problem is that said evidence will likely be “drowned out” in a sea of much more convincing evidence. That being said, the evidence still exists; you just happen to also be updating on other pieces of evidence, potentially much more convincing evidence. So “You are not required to be stupid about it” is just the observation that you should take into account other forms of evidence when performing a Bayesian update, specifically (in this case) the plausibility of the claim (because plausibility correlates semi-strongly with truth). And to that I have but one thing to say: duh!
        
        Bayes is definitely about statistical correlation. You can call it “updating your estimates on the basis of new data points” if you want, but it’s still all probabilities—and you need correlations for those. For example: if you don’t know how much phenomenon A correlates with phenomenon B, how are you supposed to calculate the conditional probabilities P(A|B) and P(B|A)?
        Lumifer 17 Dec 2014 4:46 UTC
        −2 points
        Parent
        
        Bayes is definitely about statistical correlation.
        
        No, I strongly disagree.
        
        it’s still all probabilities—and you need correlations for those
        
        I do not need correlations for probabilities—where did you get that strange idea?
        
        To make a simple observation, “correlation” is a linear relationship and there are many things in this world that are dependent in more complex ways. Are you familiar with the Anscombe’s quartet, by the way?
        dxu 17 Dec 2014 4:58 UTC
        4 points
        Parent
        
        I do not need correlations for probabilities—where did you get that strange idea?
        
        In that case, I’ll repeat my earlier question:
        
        if you don’t know how much phenomenon A correlates with phenomenon B, how are you supposed to calculate the conditional probabilities P(A|B) and P(B|A)?
        
        Lumifer 17 Dec 2014 5:44 UTC
        5 points
        Parent
        There is no general answer—this question goes to why do you consider a particular data point to be evidence suitable for updating your prior. Ideally you have causal (structural) knowledge about the relationship between A & B, but lacking that you probably should have some model (implicit or explicit) about that relationship. The relationship does not have to be linear and does not have to show up as correlation (though it, of course, might).