Upvoted, because I think this a naturally interesting topic and is always relevant on LessWrong. I particularly like the threshold optimization section—it is accessible for people who aren’t especially advanced in math, and sacrifices little in terms of flow and readability for the rigor gains.
I don’t agree that the cost of a false-positive is negligible in general. In order for that to be true, search would have to be reliable and efficient, which among other things means we would need to know what brilliant looked like, in searchable terms, before we found it. This has not been my experience in any domain I have encountered; by contrast it reliably takes me years of repeated searches to refine down to being able to identify the brilliant stuff. It appears to me that search grants access to the best stuff eventually, but doesn’t do a good job of making the best stuff easy to find. That’s the job of a filter (or curator).
A second objection is that false positives can easily be outright harmful. For example, consider history: the 90% crap history is factually wrong along dimensions running from accidentally repeating defunct folk wisdom to deliberate fabrications with malicious intent. Crap history directly causes false beliefs, which is very different from bad poetry which is at worst aesthetically repulsive. Crap medical research, which I think in Sturgeon’s sense would include things like anti-vaccine websites and claims that essential oils cure cancer, cause beliefs that directly lead to suffering and death. This is weirdly also a search problem, since it is much easier for everyone to access the worst things which are free, and the best things are normally gated behind journals or limited-access libraries.
On reflection, I conclude that the precision-recall tradeoff varies based on subject, and also separately based on search, and that both types of variance are large.
Crap medical research, which I think in Sturgeon’s sense would include things like anti-vaccine websites and claims that essential oils cure cancer, cause beliefs that directly lead to suffering and death.
I don’t think those are examples of medical research in Sturgeon’s sense. Outside of “essential oil cure cancer” blogs, half of the published cancer research doesn’t replicate and I wouldn’t surprised if the majority of the rest is useless for actually doing anything about cancer for other reasons.
The precision-recall tradeoff definitely varies from one task to another. I split tasks into “precision-maxxing” (where false-positives are costlier than false-negatives) and “recall-maxxing” (where false-negatives are costlier than false-positives).
I disagree with your estimate of the relative costs in history and in medical research. The truth is that academia does surprisingly well at filtering out the good from the bad.
Suppose I select two medical papers at random — one from the set of good medical papers, and one from the set of crap medical papers. If a wizard offered to permanently delete both papers from reality, that would rarely be a good deal because the benefit of deleting the crap paper is negligible compared to the cost of losing the good paper.
But what if the wizard offered to delete M crap papers and one good paper? How large must M be before this is a good deal? The minimal acceptable M is CFN/CFP, so τ⋆=CFP/(CFP+CFN)=1/(1+M). I’d guess that M is at least 30, so τ⋆ is at most 3.5%.
Reviewing the examples in the post again, I think I was confused on first reading. I initially read the nuclear reactor example as being a completed version of the Michaelangelo example, but now I see it clearly includes the harms issue I was thinking about.
I also think that the Library of Babel example contains my search thoughts, just not separated out in the same way as in the Poorly Calibrated Heuristics section.
Upvoted, because I think this a naturally interesting topic and is always relevant on LessWrong. I particularly like the threshold optimization section—it is accessible for people who aren’t especially advanced in math, and sacrifices little in terms of flow and readability for the rigor gains.
I don’t agree that the cost of a false-positive is negligible in general. In order for that to be true, search would have to be reliable and efficient, which among other things means we would need to know what brilliant looked like, in searchable terms, before we found it. This has not been my experience in any domain I have encountered; by contrast it reliably takes me years of repeated searches to refine down to being able to identify the brilliant stuff. It appears to me that search grants access to the best stuff eventually, but doesn’t do a good job of making the best stuff easy to find. That’s the job of a filter (or curator).
A second objection is that false positives can easily be outright harmful. For example, consider history: the 90% crap history is factually wrong along dimensions running from accidentally repeating defunct folk wisdom to deliberate fabrications with malicious intent. Crap history directly causes false beliefs, which is very different from bad poetry which is at worst aesthetically repulsive. Crap medical research, which I think in Sturgeon’s sense would include things like anti-vaccine websites and claims that essential oils cure cancer, cause beliefs that directly lead to suffering and death. This is weirdly also a search problem, since it is much easier for everyone to access the worst things which are free, and the best things are normally gated behind journals or limited-access libraries.
On reflection, I conclude that the precision-recall tradeoff varies based on subject, and also separately based on search, and that both types of variance are large.
I don’t think those are examples of medical research in Sturgeon’s sense. Outside of “essential oil cure cancer” blogs, half of the published cancer research doesn’t replicate and I wouldn’t surprised if the majority of the rest is useless for actually doing anything about cancer for other reasons.
The precision-recall tradeoff definitely varies from one task to another. I split tasks into “precision-maxxing” (where false-positives are costlier than false-negatives) and “recall-maxxing” (where false-negatives are costlier than false-positives).
I disagree with your estimate of the relative costs in history and in medical research. The truth is that academia does surprisingly well at filtering out the good from the bad.
Suppose I select two medical papers at random — one from the set of good medical papers, and one from the set of crap medical papers. If a wizard offered to permanently delete both papers from reality, that would rarely be a good deal because the benefit of deleting the crap paper is negligible compared to the cost of losing the good paper.
But what if the wizard offered to delete M crap papers and one good paper? How large must M be before this is a good deal? The minimal acceptable M is CFN/CFP, so τ⋆=CFP/(CFP+CFN)=1/(1+M). I’d guess that M is at least 30, so τ⋆ is at most 3.5%.
Reviewing the examples in the post again, I think I was confused on first reading. I initially read the nuclear reactor example as being a completed version of the Michaelangelo example, but now I see it clearly includes the harms issue I was thinking about.
I also think that the Library of Babel example contains my search thoughts, just not separated out in the same way as in the Poorly Calibrated Heuristics section.
I’m going to chalk this one up to an oops!