Re-reading that Sesardic paper set me to thinking about further issues relating to use of statistical evidence in criminal cases. In this post Yudkowsky points out that whilst all legal evidence should ideally be rational evidence, not all rational evidence is suitable as legal evidence. This is because certain rational evidence sources would become systematically corrupted and cease to function as such, if they were liable to be used as legal evidence (he uses as an example the police commissioner’s confidential disclosure to a friend of the identity of the city’s crime boss).
In Sesardic’s paper, he calculates (using the same statistical sources chosen by the statisticians that he is criticising) that the prior probability of a mother such as Sally Clark, who has had two infants die in succession for no apparent medical reason, being guilty of double murder is 25 times greater than the prior probability of her children having both died innocently through “SIDS” (ignoring the probability of one infant having died of SIDS and the other having been murdered, which is very tiny and superfluous to the analysis). This is before he gets to the Bayesian effect of the evidence from the specific case, which turns out to increase the likelihood of the double murder hypothesis at the expense of the double SIDS hypothesis.
Clearly (if Sesardic is convincing) Sally Clark should have been found guilty. But what if the evidence from the alleged crime scenes had been indecisive, i.e. the likelihood ratio were ~1? In this case, Bayes’s Theorem tells us that Clark is very probably guilty, but this is essentially a judgement based on statistics alone. Would it be proper for courts to convict based on this kind of result from an application of Bayes’s Theorem, assuming that said analysis had been subjected to rigorous scrutiny and appeared highly convincing to the jury? My gut feeling is no—that this is of a similar class to Yudkowsky’s rational, but not suitable legal evidence. But I’d have to think some more before defending that statement.
Would it be proper for courts to convict based on this kind of result from an application of Bayes’s Theorem, assuming that said analysis had been subjected to rigorous scrutiny and appeared highly convincing to the jury?
It would be no different than other cases of conviction or lack thereof under similar odds. Whether people who probably guilty but have an X% chance of being innocent go free or not should not depend on how jurors concluded X.
But it seems quite bizarre that somebody might be convicted purely on a statistical basis, i.e. based on a favoured Bayesian prior.
And what about politically controversial statistics and priors? Is there really any particular reason why a Bayesian shouldn’t have a significantly higher prior probability that members of certain ethnic or religious groups commit certain crimes (whatever the reasons for that may be), based on government statistics? And then convict them at a relatively high rate based on this (assuming convictions using Bayesian priors don’t contribute to future statistics used in Bayesian priors—to prevent double-counting of evidence)? Oops!
The judge in the Sally Clark case also stated his belief (in other words) that convinction should not be based on priors alone, but that there should be compelling evidence specific to the case as well.
Here is a paper discussing the problem. I don’t know if you can access that.
It doesn’t seem to me to be a problem that can be resolved easily and simply. For example in a case of terrorism, we may prefer a likelihood of guilt (derived in whatever manner) to be sufficient cause to convict. And if a woman had, say, 5 children die ostensibly of “SIDS”, then even if there was no specific evidence to suggest that it was murder rather than SIDS, the Bayesian likelihood of guilt would be so very high that it would seem to override concerns about convicting based on a prior.
It doesn’t seem to me that criminal cases are merely a matter of convicting based on likelihood of guilt, natural as that may sound. There are other human values to consider.
But it seems quite bizarre that somebody might be convicted...on a favoured Bayesian prior.
How would you describe how an ideal jury should perform its task? Not how real ones work, but an ideal one.
Is there really any particular reason why a Bayesian shouldn’t have a significantly higher prior probability that members of certain ethnic or religious groups commit certain crimes (whatever the reasons for that may be), based on government statistics?
Conviction should not be based merely on probability of guilt; consider for example where society has for good reason excluded rational evidence from being legal evidence, such as with the 5th Amendment in the United States.
The relevant comparison will be between the likelihood of arrest and trial for an innocent person of group X and a guilty person of group X, not between the rates of likelihood of conviction for random individuals of group X and Y, respectively.
If the defendant is a woman, it is not relevant that “women commit (or are convicted of) crimes less often then men”. What is relevant is how likely a female defendant is to be guilty. This may be less or more than a male defendant, but I don’t consciously have a different prior for defendants based on gender.
For example in a case of terrorism, we may prefer a likelihood of guilt (derived in whatever manner) to be sufficient cause to convict.
The likelihood of guilt we convict at should differ among crimes, just as punishment differs among crimes. This is good discrimination, discriminating between importantly different cases on the basis of important differences among them.
Convicting people differently based on the type of evidence that gives the same probability of their having committed the crime is generally baseless discrimination. If two people Al and Bob each may have committed the crime of public urination, and A did it with probability of X% considering the legal evidence, and B did it with probability of Y% considering the legal evidence, and X>Y, I don’t know if one, both, or neither should be convicted. But I do know that if Al is not convicted, then Bob shouldn’t be either.
There are a few exceptions for which good public policy depends on type of evidence, generally involving excluding it entirely for public policy reasons. For example, societies restrict how police may gather evidence and then restrict what they may do with inappropriately gathered evidence to make police comply with the rules.
The relevant comparison will be between the likelihood of arrest and trial for an innocent person of group X and a guilty person of group X, not between the rates of likelihood of conviction for random individuals of group X and Y, respectively.
In the Sally Clark case, prior probabilities are derived from statistics relating to the incidence of SIDS in the general population, in comparison to infant murder. Let us imagine that there were different statistics for group X mothers/babies and group Y mothers/babies. It might then be the case that the incidence of SIDS was lower, and murder higher, in group X than group Y. Therefore, a group Y mother with two dead infants and indecisive evidence from the alleged crime scenes might perhaps be acquitted (e.g. probability of guilt 40%) whereas the group X mother in exactly the same situation is convicted (e.g. probability of guilt 90%).
What we are questioning in this case is whether mothers are more likely to murder their babies twice, or have their babies die innocently twice for no apparent medical reason. But we are using statistics recording the incidence of one infant dying from “SIDS” or one infant being murdered. So we are legitimately using statistics about the likelihood of a random mother in either group either murdering her infant or having it die from SIDS, in a case of double infant murder or double SIDS.
This is one example in which what you implied is untrue; we have good reason to use statistics in our prior relating to the general population rather than people in court accused of this particular offence (since mothers with one SIDS infant death are not necessarily suspected of murder and arrested) and one example is enough to prove the point that use of Bayesian priors in court may have the unfortunate consequence of allowing differential convinction rates purely based on sex, ethicity or social group membership.
But in any case, even if the relevant comparison is “the likelihood of arrest and trial for an innocent person of group X and a guilty person of group X”, there is no particular reason why this should be the same across all possible groups. Therefore this (social and political) problem is likely to emerge in all sorts of criminal cases, if Bayesian priors were to become more widely used in court.
How would you describe how an ideal jury should perform its task?
That’s far too broad a question to expect someone to answer in a comment thread. All I’m saying is that, even setting aside the issue of ethicity/sex discrimination, the idea of convicting someone primarily on a statistical basis makes me uncomfortable and does not resonate with my values of fairness. I also believe that most other people would feel the same way (for example the judge in the Sally Clark case evidently agrees).
Since we try and punish criminals purely for our own reasons—not because the God of criminal cases wants us to—there’s no reason why we have to convict based on totally unbiased Bayesian calculations of guilt. If we feel that this conflicts with our other values (apart from the desire to see guilty people punished, and prevent future crimes by them), then perhaps there should be other considerations informing verdicts in criminal trials. And there might well be pragmatic reasons not to do so in any case, since the way in which the criminal justice system works in a country is not causally isolated from the behaviour of its citizens. If people think that the courts are evil, this probably isn’t going to improve any man’s quality of life even if he likes the idea of convicting based primarily on Bayesian priors.
I don’t think there’s an easy answer to this conundrum—but I’m arguing that it is a conundrum that cannot be dismissed with a wave of the hand.
Re-reading that Sesardic paper set me to thinking about further issues relating to use of statistical evidence in criminal cases. In this post Yudkowsky points out that whilst all legal evidence should ideally be rational evidence, not all rational evidence is suitable as legal evidence. This is because certain rational evidence sources would become systematically corrupted and cease to function as such, if they were liable to be used as legal evidence (he uses as an example the police commissioner’s confidential disclosure to a friend of the identity of the city’s crime boss).
In Sesardic’s paper, he calculates (using the same statistical sources chosen by the statisticians that he is criticising) that the prior probability of a mother such as Sally Clark, who has had two infants die in succession for no apparent medical reason, being guilty of double murder is 25 times greater than the prior probability of her children having both died innocently through “SIDS” (ignoring the probability of one infant having died of SIDS and the other having been murdered, which is very tiny and superfluous to the analysis). This is before he gets to the Bayesian effect of the evidence from the specific case, which turns out to increase the likelihood of the double murder hypothesis at the expense of the double SIDS hypothesis.
Clearly (if Sesardic is convincing) Sally Clark should have been found guilty. But what if the evidence from the alleged crime scenes had been indecisive, i.e. the likelihood ratio were ~1? In this case, Bayes’s Theorem tells us that Clark is very probably guilty, but this is essentially a judgement based on statistics alone. Would it be proper for courts to convict based on this kind of result from an application of Bayes’s Theorem, assuming that said analysis had been subjected to rigorous scrutiny and appeared highly convincing to the jury? My gut feeling is no—that this is of a similar class to Yudkowsky’s rational, but not suitable legal evidence. But I’d have to think some more before defending that statement.
It would be no different than other cases of conviction or lack thereof under similar odds. Whether people who probably guilty but have an X% chance of being innocent go free or not should not depend on how jurors concluded X.
But it seems quite bizarre that somebody might be convicted purely on a statistical basis, i.e. based on a favoured Bayesian prior.
And what about politically controversial statistics and priors? Is there really any particular reason why a Bayesian shouldn’t have a significantly higher prior probability that members of certain ethnic or religious groups commit certain crimes (whatever the reasons for that may be), based on government statistics? And then convict them at a relatively high rate based on this (assuming convictions using Bayesian priors don’t contribute to future statistics used in Bayesian priors—to prevent double-counting of evidence)? Oops!
The judge in the Sally Clark case also stated his belief (in other words) that convinction should not be based on priors alone, but that there should be compelling evidence specific to the case as well.
Here is a paper discussing the problem. I don’t know if you can access that.
It doesn’t seem to me to be a problem that can be resolved easily and simply. For example in a case of terrorism, we may prefer a likelihood of guilt (derived in whatever manner) to be sufficient cause to convict. And if a woman had, say, 5 children die ostensibly of “SIDS”, then even if there was no specific evidence to suggest that it was murder rather than SIDS, the Bayesian likelihood of guilt would be so very high that it would seem to override concerns about convicting based on a prior.
It doesn’t seem to me that criminal cases are merely a matter of convicting based on likelihood of guilt, natural as that may sound. There are other human values to consider.
How would you describe how an ideal jury should perform its task? Not how real ones work, but an ideal one.
Conviction should not be based merely on probability of guilt; consider for example where society has for good reason excluded rational evidence from being legal evidence, such as with the 5th Amendment in the United States.
The relevant comparison will be between the likelihood of arrest and trial for an innocent person of group X and a guilty person of group X, not between the rates of likelihood of conviction for random individuals of group X and Y, respectively.
If the defendant is a woman, it is not relevant that “women commit (or are convicted of) crimes less often then men”. What is relevant is how likely a female defendant is to be guilty. This may be less or more than a male defendant, but I don’t consciously have a different prior for defendants based on gender.
The likelihood of guilt we convict at should differ among crimes, just as punishment differs among crimes. This is good discrimination, discriminating between importantly different cases on the basis of important differences among them.
Convicting people differently based on the type of evidence that gives the same probability of their having committed the crime is generally baseless discrimination. If two people Al and Bob each may have committed the crime of public urination, and A did it with probability of X% considering the legal evidence, and B did it with probability of Y% considering the legal evidence, and X>Y, I don’t know if one, both, or neither should be convicted. But I do know that if Al is not convicted, then Bob shouldn’t be either.
There are a few exceptions for which good public policy depends on type of evidence, generally involving excluding it entirely for public policy reasons. For example, societies restrict how police may gather evidence and then restrict what they may do with inappropriately gathered evidence to make police comply with the rules.
In the Sally Clark case, prior probabilities are derived from statistics relating to the incidence of SIDS in the general population, in comparison to infant murder. Let us imagine that there were different statistics for group X mothers/babies and group Y mothers/babies. It might then be the case that the incidence of SIDS was lower, and murder higher, in group X than group Y. Therefore, a group Y mother with two dead infants and indecisive evidence from the alleged crime scenes might perhaps be acquitted (e.g. probability of guilt 40%) whereas the group X mother in exactly the same situation is convicted (e.g. probability of guilt 90%).
What we are questioning in this case is whether mothers are more likely to murder their babies twice, or have their babies die innocently twice for no apparent medical reason. But we are using statistics recording the incidence of one infant dying from “SIDS” or one infant being murdered. So we are legitimately using statistics about the likelihood of a random mother in either group either murdering her infant or having it die from SIDS, in a case of double infant murder or double SIDS.
This is one example in which what you implied is untrue; we have good reason to use statistics in our prior relating to the general population rather than people in court accused of this particular offence (since mothers with one SIDS infant death are not necessarily suspected of murder and arrested) and one example is enough to prove the point that use of Bayesian priors in court may have the unfortunate consequence of allowing differential convinction rates purely based on sex, ethicity or social group membership.
But in any case, even if the relevant comparison is “the likelihood of arrest and trial for an innocent person of group X and a guilty person of group X”, there is no particular reason why this should be the same across all possible groups. Therefore this (social and political) problem is likely to emerge in all sorts of criminal cases, if Bayesian priors were to become more widely used in court.
That’s far too broad a question to expect someone to answer in a comment thread. All I’m saying is that, even setting aside the issue of ethicity/sex discrimination, the idea of convicting someone primarily on a statistical basis makes me uncomfortable and does not resonate with my values of fairness. I also believe that most other people would feel the same way (for example the judge in the Sally Clark case evidently agrees).
Since we try and punish criminals purely for our own reasons—not because the God of criminal cases wants us to—there’s no reason why we have to convict based on totally unbiased Bayesian calculations of guilt. If we feel that this conflicts with our other values (apart from the desire to see guilty people punished, and prevent future crimes by them), then perhaps there should be other considerations informing verdicts in criminal trials. And there might well be pragmatic reasons not to do so in any case, since the way in which the criminal justice system works in a country is not causally isolated from the behaviour of its citizens. If people think that the courts are evil, this probably isn’t going to improve any man’s quality of life even if he likes the idea of convicting based primarily on Bayesian priors.
I don’t think there’s an easy answer to this conundrum—but I’m arguing that it is a conundrum that cannot be dismissed with a wave of the hand.