the presumption of innocence in modern legal systems means that the job of the jury (and by extension the legal teams) is not just to arrive at a probability of guilt but at a certain level of confidence around that probability.
I don’t think those are two separate things. What does it mean to be 50% sure that there is a 90% probability someone committed the murder? If you’re not sure you should just lower your probability of guilt.
I think Eliezer had it right that when answering the question you should give your best estimate of the probability that each suspect committed the murder. The question of what probability corresponds to ‘beyond reasonable doubt’ is a separate one and isn’t actually raised in the original question. Personally I think you’d have to assign at least a 90% probability of guilt to convict but the exact threshold is open to debate.
I just went to Wikipedia and found a more articulate version of what I’m trying to say:
Gardner-Medwin argues that the criterion on which a verdict in a criminal trial should be based is not the probability of guilt, but rather the probability of the evidence, given that the defendant is innocent (akin to a frequentist p-value). He argues that if the posterior probability of guilt is to be computed by Bayes’ theorem, the prior probability of guilt must be known. This will depend on the incidence of the crime, which is an unusual piece of evidence to consider in a criminal trial. Consider the following three propositions:
A: The known facts and testimony could have arisen if the defendant is guilty,
B: The known facts and testimony could have arisen if the defendant is innocent,
C: The defendant is guilty.
Gardner-Medwin argues that the jury should believe both A and not-B in order to convict. A and not-B implies the truth of C, but the reverse is not true. It is possible that B and C are both true, but in this case he argues that a jury should acquit, even though they know that they will be letting some guilty people go free. See also Lindley’s paradox.
I am not really a stats person and I’m not prepared to defend Garder-Medwin’s model as being correct—but right or wrong, it’s a better description than Bayesian inference of most people’s intuitive concept of the task of a juror.
In other words, when I imagine myself as a juror I’m automatically more concerned about a false positive (convicting an innocent person), and I will intuitively try to answer the question “has the prosecution proved its case” rather than “is this person guilty.”
If asked to answer the second question and quantify my odds of guilt, I’m likely to understate them, precisely because I can’t separate that estimate from the real-world effect of a guilty verdict.
Or in your terms, the “question of what probability corresponds to ‘beyond reasonable doubt’ [or whatever the equivalent standard in Italy]” can’t be completely excluded from the question when we imagine ourselves as jurors, only made implicit.
This reminds me slightly of Eliezer’s “true Prisoner’s Dilemma” article, which I really liked. Just as you can’t posit that someone is my confederate (in his case) and then ask me to consider them in a purely selfish, impartial way—you can’t tell me I’m a juror and then ask me to make a purely impartial assessment. I’m describing a much weaker effect than he was, and maybe it’s more socially conditioned than inherent to human nature, but I think the general concept is the same.
So …better to say “forget the fact that there’s even a trial going on, just imagine that tomorrow the absolute truth will be revealed and you have to bet on it now.”
He argues that if the posterior probability of guilt is to be computed by Bayes’ theorem, the prior probability of guilt must be known. This will depend on the incidence of the crime, which is an unusual piece of evidence to consider in a criminal trial.
This is an interesting point and one where I think the legal system is wrong from a strict rationality sense but I can see the argument given that juries are human and so not very rational.
It is common for juries to either not be given information which is very relevant to the prior probabilities of guilt or instructed to discard it. The debate in the UK over whether previous convictions should be admissible as evidence is a good example. From a strict Bayesian/rationality point of view, all information is potentially relevant and more information should only improve the decisions made by the jury.
Information about previous convictions is very relevant when determining priors for certain types of crimes, particularly sexual offences which triggered calls for changes to the law in the UK. The counter argument is that telling juries about prior offences will bias them too much against the defendant, in recognition of the fact that juries are prone to certain kinds of irrational bias.
The rules about keeping juries away from potentially prejudicial media coverage exist for similar reasons. The failure to avoid this in the Amanda Knox case is one of the criticisms leveled against the prosecution by her supporters.
From a strict Bayesian/rationality point of view, all information is potentially relevant and more information should only improve the decisions made by the jury.
From a really strict Bayesian point of view, more information can certainly make decision worse. Only perfect information (or, perhaps, arbitrarily close-to-perfect information??) necessarily makes decisions better. Of course, perfect information includes the bit of information saying whodunnit.
Here’s another way to put it. Given a jury, you (the court) can give them information that will cause them to decide guilty, or you can give them other information that will cause them to decide not guilty. In many cases both sets of information will be true—just different subsets of the complete truth. How do you decide what to tell them?
From a really strict Bayesian point of view, more information can certainly make decision worse. Only perfect information (or, perhaps, arbitrarily close-to-perfect information??) necessarily makes decisions better.
Not true. A perfect Bayesian updater will never make worse decisions in the light of new information. If new information causes worse decisions that is a reflection that the new information was not appropriately weighted according to the trustworthiness of the information source.
In other words, false information can only make for worse decisions if it is treated as true. The only reason you would treat false information as true is that you placed too much trust in the source of the information. The problem is not the receipt of the new information, it is incorrect updating due to incorrect priors regarding the reliability of the information source. That may be a common problem for actual imperfect humans but it is not an indication that acquiring new information can ever lead to worse decisions for a theoretical perfect Bayesian updater.
That’s not quite right. The provision of all true but biased information (e.g. only those facts that are consistent with guilt) without complete awareness of the exact nature of the bias applied can increase the chances of an error.
Even unbiased info can’t be said to always help. A good example is someone who has crazy priors. Suppose someone has the crazy prior that with probability .99999 creationism is true. If they have somehow aquired evidence that overcomes this prior but further information about problems with evolutionary theories would leave them with still strong but not convincing evidence that evolution is true then providing them with that evidence increases their chance of error.
More generally, disagreement in priors forces one to believe that others will make better decisions if evidence that exacerbates the errors in their priors is provided.
Additionally to my first reply, even if all the data the updater is given is completely true, incomplete data can still lead to worse decisions. Here’s a trivial example.
This is a list of true facts (not really, just in the example): (1) B was murdered. (2) A was in town on the day of the murder. (3) A is green. Green people are less likely to commit murder than the general population. (4) A was B’s close friend. Most murders are commited by friends.
A perfect Bayesian judge is asked: given fact (1), did A murder B? He has some prior probability for this. Then he is given fact 2. His probability (of A’s guilt) goes up. Then he is given fact 3; the probability goes down. Then fact 4; it goes up again. And so on. This works independently of whether A murdered B or not.
I think you’re confusing two things here: a) is the judge making the best decision in light of the information available to him and b) in light of new information is his probability moving in the ‘correct’ direction given what really happened. The question of b) is irrelevant: the best we can hope for is the judge to make the best possible decision given the available information. A perfect Bayesian judge will do that.
The problem in the real world of humans is not whether giving them new information will lead them closer to or further from ‘the truth’ but whether they will update correctly in light of the new information. The reason certain facts are withheld from juries is that it is believed they will not update correctly on the new information but rather will be consistently biased in a particular direction by it to an extent not warranted by the facts.
I think you’re confusing two things here: a) is the judge making the best decision in light of the information available to him and b) in light of new information is his probability moving in the ‘correct’ direction given what really happened. The question of b) is irrelevant: the best we can hope for is the judge to make the best possible decision given the available information. A perfect Bayesian judge will do that.
That’s right.
The reason certain facts are withheld from juries is that it is believed they will not update correctly on the new information but rather will be consistently biased in a particular direction by it to an extent not warranted by the facts.
That’s not right. Even if the juries always update correctly on the new information they may still become more distant from the truth. The jury may be performing your (a) perfectly, but we do really want (b). My point was that even with a perfect Bayesian jury, the disrepancy between executing (a) and (b) will cause us to withhold certain facts sometimes, because the partial presentation of the facts will cause the jury’s (a) to be closer to the actual (b), the truth.
The jury may be performing your (a) perfectly, but we do really want (b).
We might really want to ride a unicorn as well but it’s not really an option. Well, at least riding a unicorn is logically possible I suppose, unlike making decisions better than the best possible given the information available… The only way to get closer to the truth when you already update perfectly is to seek out more information.
The legal system is not designed around perfect updaters for the same reason it’s not designed around unicorns. We wouldn’t need judges and juries if we had perfect Bayesian updaters—we could just make those updaters investigator, judge, jury and executioner and tell them to deliver punishment when they reached a certain threshold level of probability.
The idea of withholding certain information from jurors is predicated on the idea that jurors are less good updaters than judges. Whether that is true or not is another question.
This is all true about our system. But my point still stands: even with perfect updaters there can still be a reason to withhold information. It’s true that it’s usually an insignificant concern with human juries, because other problems swamp this one.
You originally said:
From a strict Bayesian/rationality point of view, all information is potentially relevant and more information should only improve the decisions made by the jury.
if “improve” means “bring their decisions closer to the objective perfect-knowledge truth” then that statement is false, as I have explained. I don’t see what else “improve” can mean here—it can’t refer to the jury’s correctly updating if we assume that their updating is perfect (“strictly Bayesian”).
The only way for a perfect Bayesian updater to move closer to the truth from its own perspective is to seek out more information. Some new pieces of information could move its probability estimates in the wrong direction (relative to the unknown truth) but it cannot know in advance what those might be.
Another agent with more information could attempt to manipulate the perfect updater’s beliefs by selectively feeding it with information (it would have to be quite subtle about this and quite good at hiding it’s own motives to fool the perfect updater but with a sufficient informational advantage it should be possible). Such an agent may or may not be interested in moving the perfect updater’s beliefs closer to the truth as it perceives it but unless it has perfect information it can’t be sure what the truth is anyway. If the agent wishes to move the perfect updater in the direction of what it perceives as the truth then its best tactic is probably just to share all of its information with the perfect updater. Only if it wishes to move the perfect updater’s beliefs away from its own should it selectively withhold information.
‘Improve’ for a perfect Bayesian can only mean ‘seek out more knowledge’. A perfect Bayesian will also know exactly which information to prioritize seeking out in order to get maximum epistemic bang for its buck. A perfect Bayesian will never find itself in a situation where its best option is to avoid finding out more information or to deliberately forget information in order to move closer to the objective truth. An external agent with more knowledge could observe that the perfect Bayesian on occasion updated its probabilities in the ‘wrong’ direction (relative to the truth as perceived by the external agent) but that does not imply that the perfect Bayesian should have avoided acquiring the information given its own state of knowledge.
If the agent wishes to move the perfect updater in the direction of what it perceives as the truth then its best tactic is probably just to share all of its information with the perfect updater. Only if it wishes to move the perfect updater’s beliefs away from its own should it selectively withhold information.
Not so. The agent in question has an information advantage over another, including information about what the intended pupil believes about aspiring teacher. It knows exactly how the pupil will react to stimulus. The task then is to feed whichever combination of information leads to the state closest to that of the teacher. This is probably not sharing all information. It is probably sharing nearly all information with a some perfectly selected differences or omissions here and there.
Dan’s point still stands even in this idealised case.
The updater may be perfect, but because the updater’s knowledge is imperfect, the updater cannot correctly judge the reliability of the source of information, and therefore it may assign that information incorrect (imprecise) weight or even take false information for true.
We’re talking about a hypothetical perfect updater that always updates correctly on new information. If it’s updating incorrectly on new information due to putting too much trust in it given it’s current state of knowledge then it’s not a perfect updater.
What you call a perfect updater here is an agent with perfectly complete knowledge. That’s the only way to always judge correctly the weight of new information. Of course, such an agent never needs to update, at least not about past events.
No, I’m not talking about an agent with perfect knowledge. I’m talking about a perfect updater. A perfect Bayesian updater comes to the best possible decisions given the available information. Giving such a perfect updater new information never makes it’s decisions worse because by definition it always makes the best possible decision given the information. This is a different question from whether it’s probability estimates move closer or further from ‘the truth’ as judged from some external perspective where more information is available.
The concern with imperfect updaters like humans is that giving them more information leads them further away from the theoretical best decision given the information available to them, not that it leads them further away from ‘the truth’. In other words, giving people more information can lead them to make worse decisions (less like the decisions of a perfect Bayesian updater) which may or may not mean their opinions become more aligned with the truth.
These are both concerns, and if we could replace humans with perfect Bayesian updaters, we’d notice the only remaining concern a lot more—namely, that given more (true) information can cause the updater to move away from the objective truth we are trying to reach (the truth that is only knowable with perfect information).
Who would decide which information to withhold in that case? The only way you could be qualified to judge what information to withhold would be if you yourself had perfect information, in which case there’d really be no need for the jury and you could just pass judgement yourself. The only way for a perfect updater to get closer to the truth is for it to seek out more information.
I don’t think a formal proof is needed. An agent with imperfect knowledge does not, by definition, know what ‘the truth’ is. It may be able to judge the impact of extra information on another agent and whether that information will move the other agent closer or further from the first agent’s own probability estimates but it cannot know whether that has the result of moving the second agent’s probability estimates closer to ‘the truth’ because it does not know ‘the truth’.
Point taken. If we assume the Court-agent can effectively communicate all of its knowledge to the Jury-agent, then the Jury can make decisions at least as good as the Court’s. Or the Jury could communicate all of its knowledge to the Court and then we wouldn’t need a Jury. You’re right about this.
But as long as we’re forced to have separate Court and Jury who cannot communicate all their knowledge to one another—perhaps they can only communicate all the knowledge directly relevant to the trial at hand, or there are bandwidth constraints, or the Judge cannot itself appear as witness to provide new information to the Court—then my point stands.
I don’t think those are two separate things. What does it mean to be 50% sure that there is a 90% probability someone committed the murder? If you’re not sure you should just lower your probability of guilt.
I think Eliezer had it right that when answering the question you should give your best estimate of the probability that each suspect committed the murder. The question of what probability corresponds to ‘beyond reasonable doubt’ is a separate one and isn’t actually raised in the original question. Personally I think you’d have to assign at least a 90% probability of guilt to convict but the exact threshold is open to debate.
I just went to Wikipedia and found a more articulate version of what I’m trying to say:
I am not really a stats person and I’m not prepared to defend Garder-Medwin’s model as being correct—but right or wrong, it’s a better description than Bayesian inference of most people’s intuitive concept of the task of a juror.
In other words, when I imagine myself as a juror I’m automatically more concerned about a false positive (convicting an innocent person), and I will intuitively try to answer the question “has the prosecution proved its case” rather than “is this person guilty.”
If asked to answer the second question and quantify my odds of guilt, I’m likely to understate them, precisely because I can’t separate that estimate from the real-world effect of a guilty verdict.
Or in your terms, the “question of what probability corresponds to ‘beyond reasonable doubt’ [or whatever the equivalent standard in Italy]” can’t be completely excluded from the question when we imagine ourselves as jurors, only made implicit.
This reminds me slightly of Eliezer’s “true Prisoner’s Dilemma” article, which I really liked. Just as you can’t posit that someone is my confederate (in his case) and then ask me to consider them in a purely selfish, impartial way—you can’t tell me I’m a juror and then ask me to make a purely impartial assessment. I’m describing a much weaker effect than he was, and maybe it’s more socially conditioned than inherent to human nature, but I think the general concept is the same.
So …better to say “forget the fact that there’s even a trial going on, just imagine that tomorrow the absolute truth will be revealed and you have to bet on it now.”
This is an interesting point and one where I think the legal system is wrong from a strict rationality sense but I can see the argument given that juries are human and so not very rational.
It is common for juries to either not be given information which is very relevant to the prior probabilities of guilt or instructed to discard it. The debate in the UK over whether previous convictions should be admissible as evidence is a good example. From a strict Bayesian/rationality point of view, all information is potentially relevant and more information should only improve the decisions made by the jury.
Information about previous convictions is very relevant when determining priors for certain types of crimes, particularly sexual offences which triggered calls for changes to the law in the UK. The counter argument is that telling juries about prior offences will bias them too much against the defendant, in recognition of the fact that juries are prone to certain kinds of irrational bias.
The rules about keeping juries away from potentially prejudicial media coverage exist for similar reasons. The failure to avoid this in the Amanda Knox case is one of the criticisms leveled against the prosecution by her supporters.
From a really strict Bayesian point of view, more information can certainly make decision worse. Only perfect information (or, perhaps, arbitrarily close-to-perfect information??) necessarily makes decisions better. Of course, perfect information includes the bit of information saying whodunnit.
Here’s another way to put it. Given a jury, you (the court) can give them information that will cause them to decide guilty, or you can give them other information that will cause them to decide not guilty. In many cases both sets of information will be true—just different subsets of the complete truth. How do you decide what to tell them?
Not true. A perfect Bayesian updater will never make worse decisions in the light of new information. If new information causes worse decisions that is a reflection that the new information was not appropriately weighted according to the trustworthiness of the information source.
In other words, false information can only make for worse decisions if it is treated as true. The only reason you would treat false information as true is that you placed too much trust in the source of the information. The problem is not the receipt of the new information, it is incorrect updating due to incorrect priors regarding the reliability of the information source. That may be a common problem for actual imperfect humans but it is not an indication that acquiring new information can ever lead to worse decisions for a theoretical perfect Bayesian updater.
That’s not quite right. The provision of all true but biased information (e.g. only those facts that are consistent with guilt) without complete awareness of the exact nature of the bias applied can increase the chances of an error.
Even unbiased info can’t be said to always help. A good example is someone who has crazy priors. Suppose someone has the crazy prior that with probability .99999 creationism is true. If they have somehow aquired evidence that overcomes this prior but further information about problems with evolutionary theories would leave them with still strong but not convincing evidence that evolution is true then providing them with that evidence increases their chance of error.
More generally, disagreement in priors forces one to believe that others will make better decisions if evidence that exacerbates the errors in their priors is provided.
Additionally to my first reply, even if all the data the updater is given is completely true, incomplete data can still lead to worse decisions. Here’s a trivial example.
This is a list of true facts (not really, just in the example): (1) B was murdered. (2) A was in town on the day of the murder. (3) A is green. Green people are less likely to commit murder than the general population. (4) A was B’s close friend. Most murders are commited by friends.
A perfect Bayesian judge is asked: given fact (1), did A murder B? He has some prior probability for this. Then he is given fact 2. His probability (of A’s guilt) goes up. Then he is given fact 3; the probability goes down. Then fact 4; it goes up again. And so on. This works independently of whether A murdered B or not.
I think you’re confusing two things here: a) is the judge making the best decision in light of the information available to him and b) in light of new information is his probability moving in the ‘correct’ direction given what really happened. The question of b) is irrelevant: the best we can hope for is the judge to make the best possible decision given the available information. A perfect Bayesian judge will do that.
The problem in the real world of humans is not whether giving them new information will lead them closer to or further from ‘the truth’ but whether they will update correctly in light of the new information. The reason certain facts are withheld from juries is that it is believed they will not update correctly on the new information but rather will be consistently biased in a particular direction by it to an extent not warranted by the facts.
That’s right.
That’s not right. Even if the juries always update correctly on the new information they may still become more distant from the truth. The jury may be performing your (a) perfectly, but we do really want (b). My point was that even with a perfect Bayesian jury, the disrepancy between executing (a) and (b) will cause us to withhold certain facts sometimes, because the partial presentation of the facts will cause the jury’s (a) to be closer to the actual (b), the truth.
We might really want to ride a unicorn as well but it’s not really an option. Well, at least riding a unicorn is logically possible I suppose, unlike making decisions better than the best possible given the information available… The only way to get closer to the truth when you already update perfectly is to seek out more information.
The legal system is not designed around perfect updaters for the same reason it’s not designed around unicorns. We wouldn’t need judges and juries if we had perfect Bayesian updaters—we could just make those updaters investigator, judge, jury and executioner and tell them to deliver punishment when they reached a certain threshold level of probability.
The idea of withholding certain information from jurors is predicated on the idea that jurors are less good updaters than judges. Whether that is true or not is another question.
This is all true about our system. But my point still stands: even with perfect updaters there can still be a reason to withhold information. It’s true that it’s usually an insignificant concern with human juries, because other problems swamp this one.
You originally said:
if “improve” means “bring their decisions closer to the objective perfect-knowledge truth” then that statement is false, as I have explained. I don’t see what else “improve” can mean here—it can’t refer to the jury’s correctly updating if we assume that their updating is perfect (“strictly Bayesian”).
The only way for a perfect Bayesian updater to move closer to the truth from its own perspective is to seek out more information. Some new pieces of information could move its probability estimates in the wrong direction (relative to the unknown truth) but it cannot know in advance what those might be.
Another agent with more information could attempt to manipulate the perfect updater’s beliefs by selectively feeding it with information (it would have to be quite subtle about this and quite good at hiding it’s own motives to fool the perfect updater but with a sufficient informational advantage it should be possible). Such an agent may or may not be interested in moving the perfect updater’s beliefs closer to the truth as it perceives it but unless it has perfect information it can’t be sure what the truth is anyway. If the agent wishes to move the perfect updater in the direction of what it perceives as the truth then its best tactic is probably just to share all of its information with the perfect updater. Only if it wishes to move the perfect updater’s beliefs away from its own should it selectively withhold information.
‘Improve’ for a perfect Bayesian can only mean ‘seek out more knowledge’. A perfect Bayesian will also know exactly which information to prioritize seeking out in order to get maximum epistemic bang for its buck. A perfect Bayesian will never find itself in a situation where its best option is to avoid finding out more information or to deliberately forget information in order to move closer to the objective truth. An external agent with more knowledge could observe that the perfect Bayesian on occasion updated its probabilities in the ‘wrong’ direction (relative to the truth as perceived by the external agent) but that does not imply that the perfect Bayesian should have avoided acquiring the information given its own state of knowledge.
Not so. The agent in question has an information advantage over another, including information about what the intended pupil believes about aspiring teacher. It knows exactly how the pupil will react to stimulus. The task then is to feed whichever combination of information leads to the state closest to that of the teacher. This is probably not sharing all information. It is probably sharing nearly all information with a some perfectly selected differences or omissions here and there.
Dan’s point still stands even in this idealised case.
The updater may be perfect, but because the updater’s knowledge is imperfect, the updater cannot correctly judge the reliability of the source of information, and therefore it may assign that information incorrect (imprecise) weight or even take false information for true.
We’re talking about a hypothetical perfect updater that always updates correctly on new information. If it’s updating incorrectly on new information due to putting too much trust in it given it’s current state of knowledge then it’s not a perfect updater.
What you call a perfect updater here is an agent with perfectly complete knowledge. That’s the only way to always judge correctly the weight of new information. Of course, such an agent never needs to update, at least not about past events.
No, I’m not talking about an agent with perfect knowledge. I’m talking about a perfect updater. A perfect Bayesian updater comes to the best possible decisions given the available information. Giving such a perfect updater new information never makes it’s decisions worse because by definition it always makes the best possible decision given the information. This is a different question from whether it’s probability estimates move closer or further from ‘the truth’ as judged from some external perspective where more information is available.
The concern with imperfect updaters like humans is that giving them more information leads them further away from the theoretical best decision given the information available to them, not that it leads them further away from ‘the truth’. In other words, giving people more information can lead them to make worse decisions (less like the decisions of a perfect Bayesian updater) which may or may not mean their opinions become more aligned with the truth.
These are both concerns, and if we could replace humans with perfect Bayesian updaters, we’d notice the only remaining concern a lot more—namely, that given more (true) information can cause the updater to move away from the objective truth we are trying to reach (the truth that is only knowable with perfect information).
Who would decide which information to withhold in that case? The only way you could be qualified to judge what information to withhold would be if you yourself had perfect information, in which case there’d really be no need for the jury and you could just pass judgement yourself. The only way for a perfect updater to get closer to the truth is for it to seek out more information.
That’s a strong claim. Is there a formal proof of this?
I don’t think a formal proof is needed. An agent with imperfect knowledge does not, by definition, know what ‘the truth’ is. It may be able to judge the impact of extra information on another agent and whether that information will move the other agent closer or further from the first agent’s own probability estimates but it cannot know whether that has the result of moving the second agent’s probability estimates closer to ‘the truth’ because it does not know ‘the truth’.
Point taken. If we assume the Court-agent can effectively communicate all of its knowledge to the Jury-agent, then the Jury can make decisions at least as good as the Court’s. Or the Jury could communicate all of its knowledge to the Court and then we wouldn’t need a Jury. You’re right about this.
But as long as we’re forced to have separate Court and Jury who cannot communicate all their knowledge to one another—perhaps they can only communicate all the knowledge directly relevant to the trial at hand, or there are bandwidth constraints, or the Judge cannot itself appear as witness to provide new information to the Court—then my point stands.