Thanks! I’ve been treating forensic linguistics as a subdiscipline of stylometry, which I mention in the related work section, although it’s hard to know from the outside where particular academic boundaries are drawn. My understanding of both is that they’re primarily concerned with identifying specific authors (as in the case of Kaczynski), but that both include forays into investigating author characteristics like gender. There definitely is overlap, although those fields tend to use specialized tools, where I’m more interested in the capabilities of general-purpose models since those are where more overall risk comes from.
If LLMs are superhuman at this kind of work
To be clear, I don’t think that’s been shown as yet; I’m personally uncertain at this point. I would be surprised if they didn’t become clearly superhuman at it within another generation or two, even in the absence of any overall capability breakthroughs.
I could imagine, for example, that an authoritarian regime might have a lot of incentive to de-anonymize people.
Absolutely agreed. The majority of nearish-term privacy risk in my view comes from a mix of authorities and corporate privacy invasion, with a healthy sprinkling of blackmail (though again, I’m personally less concerned about the misuse risk than about the deception/manipulation risk both from misuse and from possible misaligned models).
Thanks! I’ve been treating forensic linguistics as a subdiscipline of stylometry, which I mention in the related work section, although it’s hard to know from the outside where particular academic boundaries are drawn. My understanding of both is that they’re primarily concerned with identifying specific authors (as in the case of Kaczynski), but that both include forays into investigating author characteristics like gender. There definitely is overlap, although those fields tend to use specialized tools, where I’m more interested in the capabilities of general-purpose models since those are where more overall risk comes from.
To be clear, I don’t think that’s been shown as yet; I’m personally uncertain at this point. I would be surprised if they didn’t become clearly superhuman at it within another generation or two, even in the absence of any overall capability breakthroughs.
Absolutely agreed. The majority of nearish-term privacy risk in my view comes from a mix of authorities and corporate privacy invasion, with a healthy sprinkling of blackmail (though again, I’m personally less concerned about the misuse risk than about the deception/manipulation risk both from misuse and from possible misaligned models).