Roughly speaking, I would say that together, all the evidence against Knox and Sollecito, of which the bra clasp and knife are for me the overwhelming majority (everything else being near-negligible), shifted P(guilt) upward by about an order of magnitude—a factor of 10, from a prior somewhere between 0.0001 and 0.001 to a posterior somewhere between 0.001 and 0.01. The Conti-Vecchiotti report cuts that down a bit, though not necessarily hugely, since their findings were pretty much what I already expected them to be.
Where does the prior “between 0.0001 and 0.001” come from?
ETA: For example, the fact that the suspect and victim are acquaintances is surely not negligible evidence, given that 52% of murders in the US are committed by a family member or acquaintance and presumably it’s similar in Italy. Perhaps that’s already taken into account in your prior, but without more information the reader has no way to tell. (If it’s too much trouble to write a more self-contained summary at this point, let me know and I’ll go read the old discussions when I get a chance.)
Separately, I’m having trouble understanding how “suspect and victim are acquaintances” is screened off by “Guede killed Kercher”. According to the wiki:
If A is a hypothesis and B and C are two pieces of evidence relating to A, then B is said to screen off C from A if P(A|B&C) = P(A|B). That is, if knowing C provides no additional information about A once B is known.
In the wiki, the example given is A=”Knox killed Kercher”, B=”Guede killed Kercher”, and C=”Kercher was killed”, so it’s trivially true that P(A|B&C) = P(A|B) since B logically implies C. But if we replace C with D=”suspect and victim are acquaintances” it’s no longer trivially true that B screens off D. Actually I think it’s false.
Consider the question, does P(A|B&~D) equal P(A|B&D)? Surely the probability that Knox is one of multiple attackers who killed Kercher, given that Guede killed Kercher (and no other evidence), would be smaller if Knox were just a random person with no relationship to Kercher? But B screens off D means that P(A|B&D) = P(A|B), which implies P(A|B) = P(A|B&~D).
What if we replace B with E=”Guede killed Kercher and there is no strong evidence of another attacker”? Same thing, we still have P(A|E&D) > P(A|E&~D).
(Note that this is not an argument that Knox killed Kercher, but just that you seem to be using the concept of “screened off” incorrectly, and also wrongly claiming “everything else [besides bra clasp and knife] being near-negligible”.)
You seem to be calling it “incorrect” if I say that “X = Y” when X is only approximately equal to Y. Obviously you’re right in a literal sense, but it’s an inappropriate criticism in this context.
“Suspect and victim are acquaintances” here is essentially the same event as “Knox’s roommate was killed”—something which significantly raises the prior probability that Knox committed murder. However, once we learn the details of the case, we find that the killing is entirely explained by the actions of Guede. (“Entirely” here is to be understood in an approximative sense.)
While it is perhaps true, using your labels above, that P(A|E&D) > P(A|E&~D), the difference between these quantities is surely very small compared to the difference between P(A|D) and P(A|E&D).
You seem to be calling it “incorrect” if I say that “X = Y” when X is only approximately equal to Y. Obviously you’re right in a literal sense, but it’s an inappropriate criticism in this context.
I think all of the disputes that show up between you and raw power are going to be of the form “K thinks that X and Y are close together, while R thinks that X and Y are far apart” or vice versa.
It seems like using your logic, we can similarly say that the evidence against Guede screens off the evidence of the bra clasp and knife. Is that correct? If not, what is the difference between “suspect and victim are acquaintances” and “there is (perhaps not particularly reliable) DNA evidence linking suspect to this murder” that makes Guede screen off one of them but not the other?
It seems like using your logic, we can similarly say that the evidence against Guede screens off the evidence of the bra clasp and knife. Is that correct?
Not really. Unlike the fact of the murder of Knox’s roommate, the bra clasp and knife are essentially independent of the evidence against Guede.
While it is perhaps true, using your labels above, that P(A|E&D) > P(A|E&~D), the difference between these quantities is surely very small compared to the difference between P(A|D) and P(A|E&D).
Is this the criteria you would use for “screened off” in general? If so, suppose we replace D with F=”some DNA evidence exists linking Knox to murder”. (E still being “evidence against Guede”.) Don’t we still have P(A|E&F) - P(A|E&~F) << P(A|F) - P(A|E&F)? To illustrate, P(A|F) = 0.1, P(A|E&F) = 0.01, P(A|E&~F) < 0.001. (These are semi-plausible numbers for illustrating this point, not my actual probabilities.)
In this later comment you say
Unlike the fact of the murder of Knox’s roommate, the bra clasp and knife are more or less independent of the evidence against Guede.
This seems to make more sense, but I’m still having trouble translating it into a technical definition of “screened off”. Can you suggest one?
It’s easy to break an approximative definition by applying it to a situation where distinctions between orders of error are important. So any such definition, strictly speaking, has to be considered a sort of analogy or metaphor that may not always be applicable to every context.
Strictly speaking, as you know, “E screens F off from A” means P(A|E&F) = P(A|E&~F). So it seems reasonable to say “E approximately screens F off from A” if |P(A|E&F) - P(A|E&~F)| is small. However, what “small” means is context-dependent. When, above, I declined to apply this terminology to E and F, it was because I was mentally comparing |P(A|E&F) - P(A|E&~F)| to |P(A|E) - P(A|E&F)|, rather than to |P(A|F) - P(A|E&F)|. The latter, of course, is much larger. So I don’t suppose I can really stop you from applying the approximative definition of “screening off” in this situation if what you’re interested in is P(A|F) vs P(A|E&F) (a large downward jump) rather than P(A|E) vs P(A|E&F) (a small upward jump).
What do you say we table this discussion about “approximately screens off”? (I’m thinking of writing a discussion post asking LW what a good, i.e., generally useful, definition of it would be. Maybe it doesn’t have to be context-dependent, or could be less context-dependent, if we talk about P(A|E&F) / P(A|E&~F) instead of P(A|E&F) - P(A|E&~F).)
For now, perhaps you can just tell me what mathematical statement you actually had in mind, when you said “Screened off by the evidence against Rudy Guede”?
For now, perhaps you can just tell me what mathematical statement you actually had in mind, when you said “Screened off by the evidence against Rudy Guede”?
The point is that the calculation you gave is missing too many steps to be useful for someone just coming to the discussion.
To take another example, does your prior take into account the gender of the suspect? (Females commit far fewer murders than males.) Or is that also screened off by some other evidence?
I suspect that a lot of the details you’re wondering about will quickly emerge in the discussion with Rolf, since (1) our opinions are widely separated, which seems to imply very different-looking calculations, with substantial inferential gaps to be bridged, and (2) that discussion is just getting started.
That said, I’m not sure which missing steps you consider the most important. A very short case summary from my point of view would be something like “student killed by burglar; housemate and boyfriend blamed before burglar discovered; after catching burglar, police filter evidence to fit three-person theory instead of dropping initial idea.” Is that helpful at all?
To take another example, does your prior take into account the gender of the suspect? (Females commit far fewer murders than males.) Or is that also screened off by some other evidence?
A reference class that gives an upper bound for my prior would be “intelligent 20-year-old female college student with no criminal history commits murder”.
At 0.013 per 1,000 people, Italy has the 47th highest murder rate in the world.
Which gives no more than 0.000013 probability that Knox is a murderer if all we know is that she lives in Italy. I guess “intelligent 20-year-old female college student with no criminal history” is less likely to commit murder than average, so I’m still confused how you got “between 0.0001 and 0.001″.
Well, the answer I suppose is that I wasn’t taking the country into account.
However, if you agree that “between 0.0001 and 0.001” is an upper bound, that surely suffices! The important kind of confusion would be where you think my prior is too low, rather than too high.
The important kind of confusion would be where you think my prior is too low, rather than too high.
I was trying to understand what evidence has been taken into account in your prior (i.e., is there some other information that might be considered Bayesian evidence against Knox, but which is already in your prior), so that I can understand what other evidence you consider “negligible”. I think at this point that confusion has been resolved.
I still wonder why the two sides don’t each post a more detailed Bayesian calculation. Let’s say A=”Knox killed Kercher”, B=”Kercher has been killed and Knoxed lived in Italy and is an intelligent 20-year-old female college student with no criminal history”, C=”evidence against Guede”, D=”Knox and Kercher were roommates”, E=”evidence of a staged burglary”, F=”bra and clasp”, G=”all other information about the case”. What are
P(A|B)
P(A|B&C)
P(A|B&C&D)
P(A|B&C&D&E)
P(A|B&C&D&E&F)
P(A|B&C&D&E&F&G)
(Or some other set of evidence and order of evaluation that might be more appropriate.) Wouldn’t that help to quickly pinpoint where your disagreements are?
Let’s say A=”Knox killed Kercher”, B=”Kercher has been killed and Knoxed lived in Italy and is an intelligent 20-year-old female college student with no criminal history”, C=”evidence against Guede”, D=”Knox and Kercher were roommates”, E=”evidence of a staged burglary”, F=”bra and clasp”, G=”all other information about the case”.
I’ll redefine slightly:
A := “Knox killed Kercher, given background info about both, but not the fact of their acquaintance”. P(A) = tiny.
B := “Kercher killed”. P(A|B) = approximately P(A). (We are not yet given that they were roommates.)
C := “evidence against Guede”. P(A|B&C) = approximately P(A). (No significant connection between Guede and Knox.)
D := “Knox and Kercher were roommates”. P(A|B&C&D) = slightly higher than P(A), but still well below the threshold of consideration.
E := “Facts cited as evidence of staged burglary”. P(A|B&C&D&E) = approximately P(A|B&C&D). (Likelihood ratios involved are close to unity; certainly small relative to P(~A)/P(A).)
F := “bra clasp and knife”. P(A|B&C&D&E&F) = possibly as much as an order of magnitude higher than P(A|B&C&D). (Explaining results is a minor puzzle.)
G := “all other information”. P(A|B&C&D&E&F&G) = approximately P(A|B&C&D&E&F). (Other evidence weak; slightly inculpatory facts canceled out by slightly exculpatory facts.)
Thanks, that’s very helpful. Perhaps you could copy this to the main debate branch, so Rolf would see it and possibly respond in a similar fashion? Also, to seek a bit more clarification, what is your estimate of P(A|B&C&D) / P(A|B&C)?
I actually inserted mine into my comment above:
Where does the prior “between 0.0001 and 0.001” come from?
ETA: For example, the fact that the suspect and victim are acquaintances is surely not negligible evidence, given that 52% of murders in the US are committed by a family member or acquaintance and presumably it’s similar in Italy. Perhaps that’s already taken into account in your prior, but without more information the reader has no way to tell. (If it’s too much trouble to write a more self-contained summary at this point, let me know and I’ll go read the old discussions when I get a chance.)
Screened off by the evidence against Rudy Guede.
Separately, I’m having trouble understanding how “suspect and victim are acquaintances” is screened off by “Guede killed Kercher”. According to the wiki:
In the wiki, the example given is A=”Knox killed Kercher”, B=”Guede killed Kercher”, and C=”Kercher was killed”, so it’s trivially true that P(A|B&C) = P(A|B) since B logically implies C. But if we replace C with D=”suspect and victim are acquaintances” it’s no longer trivially true that B screens off D. Actually I think it’s false.
Consider the question, does P(A|B&~D) equal P(A|B&D)? Surely the probability that Knox is one of multiple attackers who killed Kercher, given that Guede killed Kercher (and no other evidence), would be smaller if Knox were just a random person with no relationship to Kercher? But B screens off D means that P(A|B&D) = P(A|B), which implies P(A|B) = P(A|B&~D).
What if we replace B with E=”Guede killed Kercher and there is no strong evidence of another attacker”? Same thing, we still have P(A|E&D) > P(A|E&~D).
(Note that this is not an argument that Knox killed Kercher, but just that you seem to be using the concept of “screened off” incorrectly, and also wrongly claiming “everything else [besides bra clasp and knife] being near-negligible”.)
You seem to be calling it “incorrect” if I say that “X = Y” when X is only approximately equal to Y. Obviously you’re right in a literal sense, but it’s an inappropriate criticism in this context.
“Suspect and victim are acquaintances” here is essentially the same event as “Knox’s roommate was killed”—something which significantly raises the prior probability that Knox committed murder. However, once we learn the details of the case, we find that the killing is entirely explained by the actions of Guede. (“Entirely” here is to be understood in an approximative sense.)
While it is perhaps true, using your labels above, that P(A|E&D) > P(A|E&~D), the difference between these quantities is surely very small compared to the difference between P(A|D) and P(A|E&D).
I think all of the disputes that show up between you and raw power are going to be of the form “K thinks that X and Y are close together, while R thinks that X and Y are far apart” or vice versa.
You meant Rolf Nelson, I assume.
natch, sorry
It seems like using your logic, we can similarly say that the evidence against Guede screens off the evidence of the bra clasp and knife. Is that correct? If not, what is the difference between “suspect and victim are acquaintances” and “there is (perhaps not particularly reliable) DNA evidence linking suspect to this murder” that makes Guede screen off one of them but not the other?
Not really. Unlike the fact of the murder of Knox’s roommate, the bra clasp and knife are essentially independent of the evidence against Guede.
What you said above was:
Is this the criteria you would use for “screened off” in general? If so, suppose we replace D with F=”some DNA evidence exists linking Knox to murder”. (E still being “evidence against Guede”.) Don’t we still have P(A|E&F) - P(A|E&~F) << P(A|F) - P(A|E&F)? To illustrate, P(A|F) = 0.1, P(A|E&F) = 0.01, P(A|E&~F) < 0.001. (These are semi-plausible numbers for illustrating this point, not my actual probabilities.)
In this later comment you say
This seems to make more sense, but I’m still having trouble translating it into a technical definition of “screened off”. Can you suggest one?
It’s easy to break an approximative definition by applying it to a situation where distinctions between orders of error are important. So any such definition, strictly speaking, has to be considered a sort of analogy or metaphor that may not always be applicable to every context.
Strictly speaking, as you know, “E screens F off from A” means P(A|E&F) = P(A|E&~F). So it seems reasonable to say “E approximately screens F off from A” if |P(A|E&F) - P(A|E&~F)| is small. However, what “small” means is context-dependent. When, above, I declined to apply this terminology to E and F, it was because I was mentally comparing |P(A|E&F) - P(A|E&~F)| to |P(A|E) - P(A|E&F)|, rather than to |P(A|F) - P(A|E&F)|. The latter, of course, is much larger. So I don’t suppose I can really stop you from applying the approximative definition of “screening off” in this situation if what you’re interested in is P(A|F) vs P(A|E&F) (a large downward jump) rather than P(A|E) vs P(A|E&F) (a small upward jump).
What do you say we table this discussion about “approximately screens off”? (I’m thinking of writing a discussion post asking LW what a good, i.e., generally useful, definition of it would be. Maybe it doesn’t have to be context-dependent, or could be less context-dependent, if we talk about P(A|E&F) / P(A|E&~F) instead of P(A|E&F) - P(A|E&~F).)
For now, perhaps you can just tell me what mathematical statement you actually had in mind, when you said “Screened off by the evidence against Rudy Guede”?
P(A|E&D) is much closer to P(A) than to P(A|D).
The point is that the calculation you gave is missing too many steps to be useful for someone just coming to the discussion.
To take another example, does your prior take into account the gender of the suspect? (Females commit far fewer murders than males.) Or is that also screened off by some other evidence?
I suspect that a lot of the details you’re wondering about will quickly emerge in the discussion with Rolf, since (1) our opinions are widely separated, which seems to imply very different-looking calculations, with substantial inferential gaps to be bridged, and (2) that discussion is just getting started.
That said, I’m not sure which missing steps you consider the most important. A very short case summary from my point of view would be something like “student killed by burglar; housemate and boyfriend blamed before burglar discovered; after catching burglar, police filter evidence to fit three-person theory instead of dropping initial idea.” Is that helpful at all?
A reference class that gives an upper bound for my prior would be “intelligent 20-year-old female college student with no criminal history commits murder”.
Wikipedia says
Which gives no more than 0.000013 probability that Knox is a murderer if all we know is that she lives in Italy. I guess “intelligent 20-year-old female college student with no criminal history” is less likely to commit murder than average, so I’m still confused how you got “between 0.0001 and 0.001″.
Well, the answer I suppose is that I wasn’t taking the country into account.
However, if you agree that “between 0.0001 and 0.001” is an upper bound, that surely suffices! The important kind of confusion would be where you think my prior is too low, rather than too high.
I was trying to understand what evidence has been taken into account in your prior (i.e., is there some other information that might be considered Bayesian evidence against Knox, but which is already in your prior), so that I can understand what other evidence you consider “negligible”. I think at this point that confusion has been resolved.
I still wonder why the two sides don’t each post a more detailed Bayesian calculation. Let’s say A=”Knox killed Kercher”, B=”Kercher has been killed and Knoxed lived in Italy and is an intelligent 20-year-old female college student with no criminal history”, C=”evidence against Guede”, D=”Knox and Kercher were roommates”, E=”evidence of a staged burglary”, F=”bra and clasp”, G=”all other information about the case”. What are
P(A|B)
P(A|B&C)
P(A|B&C&D)
P(A|B&C&D&E)
P(A|B&C&D&E&F)
P(A|B&C&D&E&F&G)
(Or some other set of evidence and order of evaluation that might be more appropriate.) Wouldn’t that help to quickly pinpoint where your disagreements are?
I’ll redefine slightly:
A := “Knox killed Kercher, given background info about both, but not the fact of their acquaintance”. P(A) = tiny.
B := “Kercher killed”. P(A|B) = approximately P(A). (We are not yet given that they were roommates.)
C := “evidence against Guede”. P(A|B&C) = approximately P(A). (No significant connection between Guede and Knox.)
D := “Knox and Kercher were roommates”. P(A|B&C&D) = slightly higher than P(A), but still well below the threshold of consideration.
E := “Facts cited as evidence of staged burglary”. P(A|B&C&D&E) = approximately P(A|B&C&D). (Likelihood ratios involved are close to unity; certainly small relative to P(~A)/P(A).)
F := “bra clasp and knife”. P(A|B&C&D&E&F) = possibly as much as an order of magnitude higher than P(A|B&C&D). (Explaining results is a minor puzzle.)
G := “all other information”. P(A|B&C&D&E&F&G) = approximately P(A|B&C&D&E&F). (Other evidence weak; slightly inculpatory facts canceled out by slightly exculpatory facts.)
Thanks, that’s very helpful. Perhaps you could copy this to the main debate branch, so Rolf would see it and possibly respond in a similar fashion? Also, to seek a bit more clarification, what is your estimate of P(A|B&C&D) / P(A|B&C)?