This is probably far from complete, but I think the references in the survey paper, and in the Staab et al. paper should have some additional good ones as well.
I’ve seen some of the PII/memorization work, but I think that problem is distinct from what I’m trying to address here; what I was most interested in is what the model can infer about someone who doesn’t appear in the training data at all. In practice it can be hard to distinguish those cases, but conceptually I see them as pretty distinct.
The demographics link (‘Privacy Risks of General-Purpose Language Models’) is interesting and I hadn’t seen it, thanks! It seems mostly pretty different from what I’m trying to look at, in that they’re looking at questions about models’ ability to reconstruct text sequences (including eg genome sequences), whereas I’m looking at questions about what the model can infer about users/authors.
Bias/fairness work is interesting and related, but aiming in a somewhat different direction—I’m not interested in inference of demographic characteristics primarily because they can have bias consequences (although certainly it’s valuable to try to prevent bias!). For me they’re primarily a relatively easy-to-measure proxy for broader questions about what the model is able to infer about users from their text. In the long run I’m much more interested in what the model can infer about users’ beliefs, because that’s what enables the model to be deceptive or manipulative.
I’ve focused here on differences between the work you linked and what I’m aiming toward, but those are still all helpful references, and I appreciate you providing them!
Yeah for sure!
For PII—A relatively recent survey paper: https://arxiv.org/pdf/2403.05156
For pii/memorization generally:
https://arxiv.org/pdf/2302.00539
https://arxiv.org/abs/2202.07646
Lab’s LLM safety section typically has a pii/memorization section
For demographics inference:
https://ieeexplore.ieee.org/document/9152761
For bias/fairness—survey paper: https://arxiv.org/pdf/2309.00770
This is probably far from complete, but I think the references in the survey paper, and in the Staab et al. paper should have some additional good ones as well.
Thanks!
I’ve seen some of the PII/memorization work, but I think that problem is distinct from what I’m trying to address here; what I was most interested in is what the model can infer about someone who doesn’t appear in the training data at all. In practice it can be hard to distinguish those cases, but conceptually I see them as pretty distinct.
The demographics link (‘Privacy Risks of General-Purpose Language Models’) is interesting and I hadn’t seen it, thanks! It seems mostly pretty different from what I’m trying to look at, in that they’re looking at questions about models’ ability to reconstruct text sequences (including eg genome sequences), whereas I’m looking at questions about what the model can infer about users/authors.
Bias/fairness work is interesting and related, but aiming in a somewhat different direction—I’m not interested in inference of demographic characteristics primarily because they can have bias consequences (although certainly it’s valuable to try to prevent bias!). For me they’re primarily a relatively easy-to-measure proxy for broader questions about what the model is able to infer about users from their text. In the long run I’m much more interested in what the model can infer about users’ beliefs, because that’s what enables the model to be deceptive or manipulative.
I’ve focused here on differences between the work you linked and what I’m aiming toward, but those are still all helpful references, and I appreciate you providing them!