Miller comments on Statistical Prediction Rules Out-Perform Expert Human Judgments

Miller 18 Jan 2011 12:54 UTC
−2 points
No. I’m talking about classes of errors.

As in, which is better?
- A test that reports 100 false positives for every 100 false negatives for disease X
- A test that reports 110 false positives for every 90 false negatives for disease X
The cost of fp vs. fn is not defined automatically. If humans are closer to #1 than #2, and I develop a system like #2, I might define #2 to be better. Then later on down the line I stop talking about how I defined better, and I just use the word better, and no one questions it because hey… better is better, right?
- shokwave 18 Jan 2011 13:41 UTC
  3 points
  Parent
  Which is more costly, false positives or false negatives? This is an easy question to answer.
  
  If false positives, #1 is better. If false negatives, #2. I really do not see what your point is. These problems you bring up are easily solved.
  - handoflixue 18 Jan 2011 20:26 UTC
    6 points
    Parent
    Which is better: Releasing a violent prisoner, or keeping a harmless one incarcerated? If you can find an answer that 90% of the population agrees on, then I think you’ve done better than every politician in history.
    
    That people do NOT agree suggest to me that it’s hardly a trivial question...
    - shokwave 19 Jan 2011 6:19 UTC
      0 points
      Parent
      
      Releasing a violent prisoner, or keeping a harmless one incarcerated?
      
      How violent, how preventably violent, how harmless, how incarcerated, how long incarcerated? For any specific case with these agreed-upon, I am confident a supermajority would agree.
      
      That people do NOT agree suggest to me that it’s hardly a trivial question...
      
      That people don’t agree suggests one side is comparing releasing a serial killer to incarcerating a drifter in jail a short while, and the other side is comparing releasing a middle-aged man who in a fit of passion struck his adulterous wife to incarcerating Ghandi for the term of his natural life. More generally, they are deciding based on one specific example they have strongly available to them.
      
      In the state you phrased it, that question is about as answerable as “how long is a piece of string?”.
    - Miller 18 Jan 2011 23:00 UTC
      0 points
      Parent
      Yes. Thank you. Since at least one person understood me, I’m gonna jump off the merry-go-round at this point.
    - handoflixue 18 Jan 2011 20:29 UTC
      0 points
      Parent
      (For reference, I realize an expert runs in to the same issue, I just think it’s unfair to say that the issue is “easily solved”)
- jimrandomh 18 Jan 2011 14:18 UTC
  2 points
  Parent
  Many tests have a continuous, adjustable parameter for sensitivity, letting you set the trade-off however you want. In that case, we can refrain from judging the relative badness of false positives and false negatives, and use ROCA, which is basically the integral over all such trade-offs. Tests that are going to be combined into a larger predictor are usually measured this way.
  
  Machine learning packages generally let you specify a “cost matrix”, which is the cost of each possible confusion. For a 2-valued test, it would be a 2x2 matrix with zeroes on the diagonal, and the cost of A->B and B->A errors in the other two spots. For a test with N possible results, the matrix is NxN, with zeroes on the diagonals, and each (row,col) position is the cost of a mistake that confuses the result corresponding to that row with the result corresponding to that column.