IlyaShpitser comments on True numbers and fake numbers

IlyaShpitser 6 Feb 2014 15:17 UTC
5 points
Not sure why all the obsession with ordering people from best to worst. Some people don’t think quickly on their feet, or are bad test takers, etc. Can’t we just look at what people have done?
- lmm 6 Feb 2014 15:25 UTC
  8 points
  Parent
  If you can find a good way to put a number on “what people have done”, sure. But if not, I need a way to answer questions like “would giving people chlorine supplements make them more effective at achieving their goals”, and score on IQ test is a) easy to calculate b) at least somewhat correlated with e.g. doing well at one’s job.
  - Lumifer 6 Feb 2014 16:15 UTC
    −4 points
    Parent
    
    is a) easy to calculate
    
    That’s actually a bad reason (see the Streetlight Fallacy or the availability bias in general).
    - Creutzer 6 Feb 2014 16:26 UTC
      4 points
      Parent
      Well, once you look at things a bit more subtly, the reason doesn’t seem quite so bad anymore. There is a balance to be struck between the accuracy of the measure and the easiness of calculating it. The most accurate measure is useless if you can’t calculate it. So what’s wrong with using a less accurate measure (but it still works to some extent—you mustn’t look at a) isolation from b)) which you can, in fact, calculate?
      - Lumifer 6 Feb 2014 16:32 UTC
        0 points
        Parent
        The need for balance is a fair point. But then you should make decisions about what to measure on the basis of an explicit trade-off between “easier to get” and “more relevant”—effectively you are using an estimate for the quantity you are really interested in and so you need some support for the notion that your estimate is a reasonable one.
- bramflakes 6 Feb 2014 15:56 UTC
  7 points
  Parent
  
  Some people don’t think quickly on their feet, or are bad test takers, etc. Can’t we just look at what people have done?
  
  The thing that makes IQ tests work is that these variables correlate to each other.
  - IlyaShpitser 6 Feb 2014 17:02 UTC
    5 points
    Parent
    Sure, but (a) correlation isn’t transitive, and (b) some very smart impressive people do poorly on metrics like IQ tests.
    
    I think cumulative lifetime output is better than IQ. Yes the former isn’t directly comparable across cohorts, and there are a million other complications. But trivial metrics like IQ seem to me to be looking under a streetlight.
    
    Google did some experiments on measurable ways to do interviews (puzzles, etc.) and found no effect on hire quality. Undue insistence on proxies over real parameters is a failure mode, imo.
    - gwern 6 Feb 2014 17:26 UTC
      18 points
      Parent
      
      Google did some experiments on measurable ways to do interviews (puzzles, etc.) and found no effect on hire quality.
      
      Unsurprising due to range restriction—by the time you’re interviewing with Google, you’ve gone through tons of filters (especially if you’re a Stanford grad). This is the same reason that when people look at samples of elite scientists, IQ tends to not be as important a factor as one would expect—because they’re all smart—and other things like personality factors start to correlate more.
      
      EDIT: this may be related to Spearman’s law of diminishing returns
      - IlyaShpitser 6 Feb 2014 17:40 UTC
        8 points
        Parent
        I am just saying that for people who are capable of doing more than flipping burgers (which probably starts well before a single sigma out from the mean), we should just look at what they did.
        
        This approach has the advantage of not counting highly the kind of people who may place well on tests, etc. due to good hardware, but who, due to poor habits or whatever other reason, end up not living up to their potential.
        
        Similarly, this approach highlights that creative output is often not comparable. Is Van Gogh “better” than Shakespeare? A silly question.
        
        I don’t disagree that IQ tests are useful for some things for folks within a sigma of the mean, and I also agree with the consensus that tests start to fail for smart folks, and we need better models then.
        
        If the average IQ of LW is really around 140, then I think we should talk about the neat things we have done, and not the average IQ of LW. :)
        DanArmak 7 Feb 2014 11:27 UTC
        2 points
        Parent
        Tests are often used to decide what to allow people to do, so you can’t rely on what they’ve done already. When testing applicants to college, they don’t often have a significant history of doing.
    - Richard_Kennaway 6 Feb 2014 17:53 UTC
      16 points
      Parent
      
      Google did some experiments on measurable ways to do interviews (puzzles, etc.) and found no effect on hire quality.
      
      But they only hire at the top, so one would expect the subsequent performance of their hires to be little correlated with any sort of interview assessments.
      
      Toy example: 0.8 correlation between two variables, select on one at 3 or more s.d.s above the mean, correlation within that subpopulation is around 0.2 to 0.45 (it varies a lot, even in a sample of 100000).
    - bramflakes 6 Feb 2014 17:09 UTC
      2 points
      Parent
      Of course outliers exist, they’re the exceptions that demonstrate the rule.
      
      Besides, how do you even define “cumulative lifetime output”? Galton tried doing this at first, then realized it was impossible to make rigorous, which led to his proto-IQ tests in the first place.
      - IlyaShpitser 6 Feb 2014 17:18 UTC
        1 point
        Parent
        I think if the real parameter is hard to measure or is maybe actually multiple parameters the correct answer is to think about modeling harder, not to insist that a dumb model is what we should use.
        
        In less quantititative fields that need to be a little quantitative to publish they have a silly habit of slapping a linear regression model on their problem and calling it a day.
        
        Besides, how do you even define “cumulative lifetime output”?
        
        Papers, books, paintings, creating output? Do you think Van Gogh and his ilk would do well on an IQ test?
        Creutzer 6 Feb 2014 17:23 UTC
        0 points
        Parent
        Cumulative lifetime output doesn’t seem very useful, though. For one thing, it’s only measurable for dead or near-dead people...
        IlyaShpitser 6 Feb 2014 17:26 UTC
        4 points
        Parent
        ???
        
        Cumulative just means “what you have done so far.”
        Creutzer 6 Feb 2014 17:29 UTC
        4 points
        Parent
        You’re right, of course. Nevermind. Though the problem of measuring it for someone who hasn’t yet had the chance to do much remains.
        [deleted] 6 Feb 2014 17:26 UTC
        0 points
        Parent
        Expected cumulative lifetime output, then.
        
        Two papers per year * 30 years of productive career = 60 papers.… :-(
        VAuroch 6 Feb 2014 19:17 UTC
        2 points
        Parent
        Most people, unlike you (according to your name, at least), are not paper machines.
        
        Someone who works in a large department that values number of papers published and number of grants secured but doesn’t particularly care about quality of work, and so publishes four papers a year of poor quality, which are occasionally cited, but only by direct colleagues, vs. Douglas Hoftstadter, who rarely publishes anything but whose first work has been immensely influential, you’re going to get a worse picture than if you had just used IQ.
        [deleted] 6 Feb 2014 21:21 UTC
        4 points
        Parent
        Heh, I suppose that is one of the alternative readings of my handle.
        
        Someone who … publishes four papers a year of poor quality, which are occasionally cited, but only by direct colleagues....
        
        Only four? Why, I know some (who will remain nameless) that published eight or ten papers last year alone.
        
        But of course Goodhart’s law ruins everything.
    - Lumifer 6 Feb 2014 17:20 UTC
      −2 points
      Parent
      
      I think cumulative lifetime output is better than IQ.
      
      For which purpose?
      - IlyaShpitser 6 Feb 2014 17:25 UTC
        0 points
        Parent
        For determining if someone is a giant, or a midget in giant’s clothing.
        Lumifer 6 Feb 2014 17:44 UTC
        2 points
        Parent
        That’s not really useful. I don’t even know in which context are we talking about these things. Is it about hiring someone? Is it about deciding on whether someone’s work is “worthy” to be in a museum, or published, or something else? It is about admitting people to your social circle? Is it about generally ranking all people on some scale just because we can?
        DanArmak 7 Feb 2014 11:28 UTC
        0 points
        Parent
        But that’s a test you can only run once somebody is dead, so it’s not very useful.
- fubarobfusco 7 Feb 2014 2:31 UTC
  −2 points
  Parent
  
  Not sure why all the obsession with ordering people from best to worst.
  
  Anyone who is using IQ as a proxy for the expected value of allying with the measured person (e.g. befriending them; employing them) is making a vast mistake, yes. We should expect moral factors such as trustworthiness, and (for many sorts of alliance) emotional factors such as warmth, to have a pretty huge effect on the value of an alliance.
  
  “Anne is smarter than Becky. Which one do you want as your best friend?”
  ″I don’t know, is Anne an asshole?”
  - Douglas_Knight 7 Feb 2014 6:44 UTC
    1 point
    Parent
    This has been studied and the answer is that while conscientiousness/honesty is the second most useful trait to measure when hiring, it is less valuable than measuring IQ.
  - ThisSpaceAvailable 8 Feb 2014 8:36 UTC
    0 points
    Parent
    But how can someone be trustworthey without intelligence? Even if they want to do what’s best, they can’t be relied to do what’s best if they can’t figure out what the best thing to do is. Generally speaking, the more intelligent someone is, the more predictable they are (there are a few exceptions, such s where a mixed strategy is optimal). The fact is that idiots can often screw things up more than selfish people. With a selfish person, all you have to worry about is “Is it in their interests to do what I want?” And if you’re not worrying about that to begin with, then perhaps you are being selfish.
    
    Intelligence is clearly correlated with expected value, and it’s definitely better than nothing at all. Furthermore, smart people are better than stupid people at convincing you that they’re smart. But honest people are often worse than dishonest people at convincing people that they’re honest.
    - fubarobfusco 8 Feb 2014 14:29 UTC
      0 points
      Parent
      A lot of this seems extremely contrary to my intuitions.
      
      Poor performance (for instance on tests) isn’t the result of having a high rate of random errors, but of exhibiting repeatable bugs. This means that people with worse performance will be more predictable, not less — in order to predict the better performance, you have to actually look at the universe more, whereas to predict the worse performance you only have to look at the agent’s move history.
      
      (For that matter, we can expect this from Bayes: if you’re learning poorly from your environment, you’re not updating, which means you’re generating behaviors based more on your priors alone.)
      
      The fact is that idiots can often screw things up more than selfish people.
      
      This seems to be a political tenet or tribal banner, not a self-evident fact.
      
      (Worse, it borders on the “intelligence leads necessarily to goodness” meme, which is a serious threat to AI safety. A more intelligent agent is better equipped to achieve its goals, but is not necessarily better to have around to achieve your goals if those are not the same.)
      - ThisSpaceAvailable 9 Feb 2014 7:22 UTC
        0 points
        Parent
        By more predictable, I meant greater accuracy in predicting, not that less computing power is required to predict. Someone who performs well on tests is perfectly predictable: they always get the right answer. Someone with poor performance can’t be any more predictable than that, and is often less.
        
        Just because the bug model has some value doesn’t mean that the error model has none. I would be surprised if a poorly performing student, given a test twice, were to give exactly the same wrong answers both times. I don’t understand you claim that people with worse performance would be more predictable. Given that someone is a good performer, all you need to do is solve the problem yourself, and assuming you did it correctly, you now know how that person would answer. To predict the worse performer, the move history is woefully inadequate. Poor performance is deterministic like a dice throw is deterministic. You need to know what their bugs are, what the exact conditions are, and how they’re approaching the problem. Someone who is using math will correctly evaluate 5(2+8) regardless of whether they find 2+8 first and then multiply by 5, or find 52 and 58 and add them together. But someone who doesn’t understand math will likely not only get the wrong answer, but get a different wrong answer depending on how they do the problem. Or just give up and give a random number. Just knowing how they did the problem before doesn’t tell you how they will do that exact problem in the future, and it certainly doesn’t allow you to extrapolate how they will do on other problems. If someone is doing math correctly, it doesn’t matter how they are implementing the math. But if they are doing it incorrectly, there are lots of different ways they can be doing it correctly, and given any particular problem, there are different wrong ways that get the same answer on that problem, but different answers on different problems. So just knowing what they got on one problem doesn’t distinguish between different wrong implementations.
        
        Learning poorly from your environment does not mean not updating, it means that you are updating poorly. Given the problem “d = rt, d = 20, r = 5”, if you tell a poor learner that the correct procedure is to divide 20 by 5 and get t = 4, then given the problem “d = rt, r = 6, t = 2″, they will likely divide 6 by 2 and get d = 3. They have observed that “divide the first number by the second one” is the correct procedure in one case, and incorrectly updated the prior on “always divide the first number by the second one”. To know what rule they’ve “learned”, you have to know what cases they’ve previously seen.
        
        Good learners don’t learn rules by Bayesian updating. They don’t learn “if you’re given d and r, you get t by dividing” by mindlessly observing instances and updating every time it gives the right answer. They learn it by understanding it. To know what rule a good learner has learned, you just need to know the correct rule; you don’t need to know what cases they’re seen.
        
        That there are some cases where idiots can screw things up more than selfish people is rather self-evident. “Can” does not border on “necessarily will”. Intelligence doesn’t lead to goodness in the sense of more desire to do good, but it does generally lead to goodness in the sense of more good being done.
        
        The whole point of an alliance is that you’re supposed to work together towards a common goal. If you’re trying to find stupid people so that you can have the upper hand in your dealings with them, that suggests that this isn’t really an “alliance”.