If a random 80% of suspects are guilty, the appropriate naive predictor is one that always votes “guilty”, not one that tries to match probabilities by choosing a random 80% of suspects to call guilty. Then you get an accurate result 80% of the time, which is a lot better than 68%. That seems to me a more appropriate benchmark.
(Alternatively, you might consider a predictor that matches its probabilities not to the proportion of defendants who are guilty but to the proportion who are convicted. There might be something to be said for that.)
I think the intended question is whether the legal system adds anything beyond a pure chance element. Somehow we’d need a gold standard of actually guilty and innocent suspects, then we’d need to measure whether p(guilty|convicted) > 80%. You could also ask if p(innocent|acquitted) > 20%, but that’s the same question.
Thank you! Intended or not, it’s a fantastic question, and I don’t know where to look up the answer. I’m not even sure that anyone has seriously tried to answer that question. If they haven’t, then I want to. I’ll look into it.
The closest thing I know of is the “actually innocent, but convicted” sample that gradually came to light under DNA testing of inmates. Unreported crime rates get estimated somehow, so I’d be surprised if nobody had combined those numbers to do a study in this vein. Haven’t found one with a cursory googling, though.
I don’t see how those are “the same question”. If out of 8 accused 4 are guilty and two of them are convicted, the rest acquitted. Than p(guilty|convicted) = 1 and p(innocent|acquitted) = 2⁄3.
The assumption was that 80% of defendants are guilty, which is more than 4 of 8. Under this assumption, asking whether p(guilty|convicted) > 80% is just asking whether conviction positively correlates with guilt. Asking if p(innocent|acquitted) > 20% is just asking if acquittal positively correlates with innocence. These are really the same question, because P correlates with Q iff ¬P correlates with ¬Q.
If a random 80% of suspects are guilty, the appropriate naive predictor is one that always votes “guilty”, not one that tries to match probabilities by choosing a random 80% of suspects to call guilty. Then you get an accurate result 80% of the time, which is a lot better than 68%. That seems to me a more appropriate benchmark.
(Alternatively, you might consider a predictor that matches its probabilities not to the proportion of defendants who are guilty but to the proportion who are convicted. There might be something to be said for that.)
I think the intended question is whether the legal system adds anything beyond a pure chance element. Somehow we’d need a gold standard of actually guilty and innocent suspects, then we’d need to measure whether p(guilty|convicted) > 80%. You could also ask if p(innocent|acquitted) > 20%, but that’s the same question.
Thank you! Intended or not, it’s a fantastic question, and I don’t know where to look up the answer. I’m not even sure that anyone has seriously tried to answer that question. If they haven’t, then I want to. I’ll look into it.
The closest thing I know of is the “actually innocent, but convicted” sample that gradually came to light under DNA testing of inmates. Unreported crime rates get estimated somehow, so I’d be surprised if nobody had combined those numbers to do a study in this vein. Haven’t found one with a cursory googling, though.
I don’t see how those are “the same question”. If out of 8 accused 4 are guilty and two of them are convicted, the rest acquitted. Than p(guilty|convicted) = 1 and p(innocent|acquitted) = 2⁄3.
The assumption was that 80% of defendants are guilty, which is more than 4 of 8. Under this assumption, asking whether p(guilty|convicted) > 80% is just asking whether conviction positively correlates with guilt. Asking if p(innocent|acquitted) > 20% is just asking if acquittal positively correlates with innocence. These are really the same question, because P correlates with Q iff ¬P correlates with ¬Q.
Perfect. Thanks.