I would have thought more than a footnote would have been helpful. To avoid lazy other-optimizing, I’ve written some content below which you may use/adapt/modify as you see fit.
The odds form of Bayes’ theorem is this:
P(a|b)/P(~a|b) = P(a)/P(~a) x P(b|a)/P(b|~a)
In English, the ratio of the posterior probabilities (the posterior odds of a) equals the product of the ratio of the prior probabilities and the likelihood ratio.
What we are interested in is the likelihood ratio p(e|is-real)/p(e|is-not-real), where e is all external and internal evidence we have about the DN script.
e is equivalent to the conjunction of each of the 13 individual pieces of evidence, which I’ll refer to as e1 through e13:
e = e1 & e2 & … & e13
So the likelihood ratio we’re after can be written like this:
Now comes the point where the assumption of conditional independence simplifies things greatly. The assumption is that the “impact” of each evidence (i.e. the likelihood ratio associated with it) does not vary based on what other evidence we already have. That is, for any evidence ei its likelihood ratio is the same no matter what other evidence you add to the right-hand side:
LR(ei|c) = LR(ei) for any conjunction c of other pieces of evidence
Assuming conditional independence simplifies the expression for LR(e) greatly:
LR(e) = LR(e1) LR(e2) LR(e3) … LR(e13)
On the other hand, the conditional independence assumption is likely to have a substantial impact on what value LR(e) takes. This is because most pieces of evidence are expected to correlate positively with one another instead of being independent. For example, if you know that the script is a 20,000 word long Hollywood plot and that the stylometric analysis seems to check out, then if you are dealing with a fake script (is-not-real) it is an extremely elaborate fake, and (e.g.) the PDF metadata are almost certain to “check out” and so provide much weaker evidence for is-real than the calculation assuming conditional independence suggests. On the other hand, the evidence of legal takedowns seems unaffected by this concern, as even a competent faker would hardly be expected to create the evidence of takedowns.
[The suggested back-of-the-envelope calculation could go along the lines of the last paragraph, or as I said in the grandparent you might get rid of most of the problematic correlations by considering 2-3 hypotheses about the faker’s level of skill and motivation (via a likelihood vector instead of ratio). My own guess is that stylometrics pretty much screens off all other internal evidence as well as dating and (most of) credit, but leaves takedown unaffected.]
Note to self: consider testing the obvious conspiracy theory here.
I would have thought more than a footnote would have been helpful. To avoid lazy other-optimizing, I’ve written some content below which you may use/adapt/modify as you see fit.
The odds form of Bayes’ theorem is this:
P(a|b)/P(~a|b) = P(a)/P(~a) x P(b|a)/P(b|~a)
In English, the ratio of the posterior probabilities (the posterior odds of a) equals the product of the ratio of the prior probabilities and the likelihood ratio.
What we are interested in is the likelihood ratio p(e|is-real)/p(e|is-not-real), where e is all external and internal evidence we have about the DN script.
e is equivalent to the conjunction of each of the 13 individual pieces of evidence, which I’ll refer to as e1 through e13:
e = e1 & e2 & … & e13
So the likelihood ratio we’re after can be written like this:
p(e|is-real)/p(e|is-not-real) = p(e1&e2&...&e13|is-real)/p(e1&e2&...&e13|is-not-real)
I abbreviate p(b|is-real)/p(b|is-not-real) as LR(b), and p(b|is-real&c)/p(b|is-not-real&c) as LR(b|c).
Now, it follows from probability theory that the above is equivalent to
LR(e) = LR(e1) LR(e2|e1) LR(e3|e1&e2) LR(e4|e1&e2&e3) … * LR(e13|e1&e2&...&e12)
(The ordering is arbitrary.)
Now comes the point where the assumption of conditional independence simplifies things greatly. The assumption is that the “impact” of each evidence (i.e. the likelihood ratio associated with it) does not vary based on what other evidence we already have. That is, for any evidence ei its likelihood ratio is the same no matter what other evidence you add to the right-hand side:
LR(ei|c) = LR(ei) for any conjunction c of other pieces of evidence
Assuming conditional independence simplifies the expression for LR(e) greatly:
LR(e) = LR(e1) LR(e2) LR(e3) … LR(e13)
On the other hand, the conditional independence assumption is likely to have a substantial impact on what value LR(e) takes. This is because most pieces of evidence are expected to correlate positively with one another instead of being independent. For example, if you know that the script is a 20,000 word long Hollywood plot and that the stylometric analysis seems to check out, then if you are dealing with a fake script (is-not-real) it is an extremely elaborate fake, and (e.g.) the PDF metadata are almost certain to “check out” and so provide much weaker evidence for is-real than the calculation assuming conditional independence suggests. On the other hand, the evidence of legal takedowns seems unaffected by this concern, as even a competent faker would hardly be expected to create the evidence of takedowns.
[The suggested back-of-the-envelope calculation could go along the lines of the last paragraph, or as I said in the grandparent you might get rid of most of the problematic correlations by considering 2-3 hypotheses about the faker’s level of skill and motivation (via a likelihood vector instead of ratio). My own guess is that stylometrics pretty much screens off all other internal evidence as well as dating and (most of) credit, but leaves takedown unaffected.]
Note to self: consider testing the obvious conspiracy theory here.
Thanks for the writeup. I’ll add that as a footnote.