rwallace comments on A fun estimation test, is it useful?

rwallace 21 Dec 2010 4:07 UTC
2 points
I got all 10 right, but I had previously heard of similar results, so I don’t know whether I would have otherwise known to be that careful.

Of course, for questions where I really didn’t know the answer, I had to give a range spanning more than an order of magnitude to reliably hit the target (in some cases I could have tightened up my range, but in one case I only barely got it as it was); but I still think that’s better than giving confident but wrong answers.
- ArisKatsaris 21 Dec 2010 13:51 UTC
  3 points
  Parent
  Getting all 10 rights, just means you gave too wide ranges. You were asked to have 90% certainty, so the “perfect” score is 9 correct answers out of 10. :-)
  - Emile 21 Dec 2010 15:14 UTC
    0 points
    Parent
    If I got 9 right and someone else got all 10 right and gave narrower ranges than I did, I’d say he’s probably better at estimating than I am.
    - FAWS 21 Dec 2010 16:38 UTC
      0 points
      Parent
      Better discrimination, but worse calibration (probably, low confidence since it’s only a single data point).
    - ArisKatsaris 21 Dec 2010 15:30 UTC
      0 points
      Parent
      He’d better at estimating the answers themselves, but he’d be worse at estimating his ability to estimate.
      - mwengler 21 Dec 2010 17:21 UTC
        4 points
        Parent
        To be fair, 90% confidence means 90% on average. From one test like this, I’m not sure you could conclude much difference in ability to estimate or synthesize confidence levels between people who score 8, 9, and 10. Indeed, because of the gaming ability for picking 9 with -inf to inf bounds and one with tight bounds to force a 9, I would weight a 10 achieved with tighter bounds as better at confidence estimation as a 9 achieved with wildly different or generally wider confidence bounds.
  - rwallace 21 Dec 2010 14:00 UTC
    0 points
    Parent
    But I almost got a couple wrong :)
  - wedrifid 21 Dec 2010 17:32 UTC
    −1 points
    Parent
    
    You were asked to have 90% certainty, so the “perfect” score is 9 correct answers out of 10. :-)
    
    I question that metric of ‘perfection’. I got said ‘perfect’ score by estimating, among other things, a blue wale weighing in at between 10 and 3^^^3 kg and a Sun with a surface temperature of negative one degrees Kelvin.
    - ArisKatsaris 26 Dec 2010 15:35 UTC
      1 point
      Parent
      That just means you lied to the test, which made it useless in determining your capacity to estimate certainty levels.
      
      Try for an honest attempt next time, then it’ll help you better.
      - wedrifid 26 Dec 2010 15:47 UTC
        0 points
        Parent
        
        That just means you lied to the test, which made it useless in determining your capacity to estimate certainty levels.
        
        No, what it means is that your description of the “perfect” score is wrong. Emphasis on “your” because the test itself makes no such declaration, leaving scope for a nuanced interpretation (as others have provided here).
        
        Try for an honest attempt next time, then it’ll help you better.
        
        It is not relevant (see above) but this too may be mistaken. Tests that are foiled by ‘lying to them’ are bad tests. Making a habit of engaging with them is detrimental to rational thinking. They measure and encourage the development of the ability to deceive oneself—a bias that comes naturally to humans. “Sincerity” is bullshit.
        ArisKatsaris 26 Dec 2010 21:53 UTC
        2 points
        Parent
        
        Tests that are foiled by ‘lying to them’ are bad tests.
        
        Really? What test can you imagine that checks your ability at anything which can’t be foiled by intentionally attempting to foil it?
        
        A test that measures your speed at running can be foiled if you don’t run as best as you can. A test that measures your ability to stand still can be foiled if you intentionally move. And a test that measures your intelligence can be foiled if you purposefully give it stupid answers. Which is what you did.
        
        Perhaps you mean that this would be a bad test for someone to use to evaluate others, as people can also foil the test in an upwards direction, not just a downwards one.
        
        Making a habit of engaging with them is detrimental to rational thinking.
        
        Citation needed.
        
        “Sincerity” is bullshit.
        
        No, sincerity is the opposite of bullshit. I didn’t have much of a trouble typing the range I actually believed gave me roughly a 90% chance. You on the other hand chose to type nine ranges that gave 100% chance, and one range that gave 0% chance.
        
        So I was measurably, quantifiably, more sincere than you in my answers
        wedrifid 27 Dec 2010 5:06 UTC
        0 points
        Parent
        
        A test that measures your speed at running can be foiled if you don’t run as best as you can. A test that measures your ability to stand still can be foiled if you intentionally move. And a test that measures your intelligence can be foiled if you purposefully give it stupid answers. Which is what you did.
        
        You are being silly. Self sabotage is not what we are talking about here and not relevant. In fact, if your definition of a ‘perfect score’ was actually what the test was talking about then you would be self sabotaging. See my previous support of the test itself and advocacy of a more nuanced evaluation system than integer difference minimization.
        
        No, sincerity is the opposite of bullshit.
        
        “Sincerity is bullshit.” is actually a direct quote a from On Bullshit. Those people here that use the term bullshit tend to mean it in the same sense described in that philosophical treatise.
        
        I never reward people, even myself, for self deception.
- Oscar_Cunningham 21 Dec 2010 10:04 UTC
  3 points
  Parent
  More than an order of magnitude! My answers often crossed six orders of magnitude, and I still only got 5/10!
  - rwallace 21 Dec 2010 15:42 UTC
    0 points
    Parent
    My estimate for the volume of the Great Lakes spanned several orders of magnitude, because I multiplied the uncertainties in all three dimensions.
    
    Which has relevance to real scenarios: an estimate with several independent uncertainties had better give a range, if not strictly the product of all of them, at least wider than an estimate with just one similar uncertainty.