jdp comments on Language Models Model Us

jdp 18 May 2024 22:16 UTC
8 points
0
Of the abilities Janus demoed to me, this is probably the one that most convinced me GPT-3 does deep modeling of the data generator. The formulation they showed me guessed which famous authors an unknown author is most similar to. This is more useful because it doesn’t require the model to know who the unknown author in particular is, just to know some famous author who is similar enough to invite comparison.

Twitter post I wrote about it:

https://x.com/jd_pressman/status/1617217831447465984

The prompt if you want to try it yourself. It used to be hard to find a base model to run this on but should now be fairly easy with LLaMa, Mixtral, et al.

https://gist.github.com/JD-P/632164a4a4139ad59ffc480b56f2cc99
- eggsyntax 19 May 2024 1:25 UTC
  1 point
  0
  Parent
  Interesting! Tough to test at scale, though, or score in any automated way (which is something I’m looking for in my approaches, although I realize you may not be).
  - gwern 19 May 2024 16:39 UTC
    16 points
    4
    Parent
    Oh, that seems easy enough. People might think that they are safe as long as they don’t write as much as I or Scott do under a few names, but that’s not true. If you have any writing samples at all, you just stick the list of them into a prompt and ask about similarity. Even if you have a lot of writing, context windows are now millions of tokens long, so you can stick an entire book (or three) of writing into a context window.
    
    And remember, the longer the context window, the more that the ‘prompt’ is simply an inefficient form of pretraining, where you create the hidden state of an RNN for millions of timesteps, meta-learning the new task, and then throw it away. (Although note even there that Google has a new ‘caching’ feature which lets you run the same prompt multiple times, essentially reinventing caching RNN hidden states.) So when you stick corpuses into a long prompt, you are essentially pretraining the LLM some more, and making it as capable of identifying a new author as it is capable of already identifying ‘gwern’ or ‘Scott Alexander’.
    
    So, you would simply do something like put in a list of (author, sample) as well as any additional metadata convenient like biographies, then ‘unknown sample’, and ask, ‘rank the authors by how likely they are to have written that final sample by an unknown author’.
    
    This depends on having a short list of authors which can fit in the prompt (the shorter the samples, the more you can fit, but the worse the prediction), but it’s not hard to imagine how to generalize this to an entire list. You can think of it as a noisy sorting problem or a best-arm finding problem. Just break up your entire list of n authors into groups of m, and start running the identification prompt, which will not cost n log n prompts because you’re not sorting the entire list, you are only finding the min/max (which is roughly linear). For many purposes, it would be acceptable to pay a few dozen dollars to dox an author out of a list of a few thousand candidates.
    
    djb admonishes us to always to remember to ask about amortized or economies of scales in attacks, and that’s true too here of course in stylometric attacks. If we simply do the obvious lazy sort, we are throwing away all of the useful similarity information that the LLM could be giving us. We could instead work on embedding authors by similarity using comparisons. We could, say, input 3 authors at a time, and ask “is author #1 more similar to #2, or #3?” Handwaving the details, you can then take a large set of similarity rankings, and infer an embedding which maximizes the distance between each author while still obeying the constraints. (Using expectation maximization or maybe an integer solver, idk.) Now you can efficiently look up any new author as a sort of nearest-neighbors lookup problem by running a relatively few comparison prompts and homing in on the set of author-points a new author is nearest, and use that small set for a final direct question.
    
    (All this assumes you are trying to leverage a SOTA LLM which isn’t directly accessible. If you use an off-the-shelf LLM like a LLaMA-3, you would probably do something more direct like train a triplet loss on the frozen LLM using large text corpuses and get embeddings directly, making k-NN lookups effectively free & instantaneous. In conclusion, text anonymity will soon be as dead as face anonymity.)
    What links here?
    gwern's comment on On Claude 3.5 Sonnet by Zvi (27 Jun 2024 19:47 UTC; 5 points)
    - eggsyntax 20 May 2024 17:25 UTC
      1 point
      0
      Parent
      Oh, absolutely! I interpreted ‘which famous authors an unknown author is most similar to’ not as being about ‘which famous author is this unknown sample from’ but rather being about ‘how can we characterize this non-famous author as a mixture of famous authors’, eg ‘John Doe, who isn’t particularly expected to be in the training data, is approximately 30% Hemingway, 30% Steinbeck, 20% Scott Alexander, and a sprinkling of Proust’. And I think that problem is hard to test & score at scale. Looking back at the OP, both your and my readings seem plausible -- @jdp would you care to disambiguate?
      LLMs’ ability to identify specific authors is also interesting and important; it’s just not the problem I’m personally focused on, both because I expect that only a minority of people are sufficiently represented in the training data to be identifiable, and because there’s already plenty of research out there on author identification, whereas ability to model unknown users based solely on their conversation with an LLM seems both important and underexplored.
      - gwern 20 May 2024 20:39 UTC
        2 points
        0
        Parent
        
        And I think that problem is hard to test & score at scale.
        
        The embedding approach would let you pick particular authors to measure distance to and normalize, and I suppose that’s something like a “X% Hemingway, Y% Steinbeck”...
        
        Although I think the bigger problem is, what does that even mean and why do you care? Why would you care if it was 20% Hemingway / 40% Steinbeck, rather than vice-versa, or equal, if you do not care about whether it is actually by Hemingway?
        
        I expect that only a minority of people are sufficiently represented in the training data to be identifiable
        
        I don’t think that’s true, particularly in a politics/law enforcement context. Many people now have writings on social media. The ones who do not can just be subpoenaed for their text or email histories; in the US, for example, you have basically zero privacy rights in those and no warrant is necessary to order Google to turn over all your emails. There is hardly anyone who matters who doesn’t have at least thousands of words accessible somewhere.
        eggsyntax 21 May 2024 19:07 UTC
        1 point
        0
        Parent
        Although I think the bigger problem is, what does that even mean and why do you care? Why would you care if it was 20% Hemingway / 40% Steinbeck, rather than vice-versa, or equal, if you do not care about whether it is actually by Hemingway?
        In John’s post, I took it as being an interesting and relatively human-interpretable way to characterize unknown authors/users. You could perhaps use it analogously to eigenfaces.
        There is hardly anyone who matters who doesn’t have at least thousands of words accessible somewhere.
        I see a few different threat models here that seem useful to disentangle:
        For an adversary with the resources of, say, an intelligence agency, I could imagine them training or fine-tuning on all the text from everyone’s emails and social media posts, and then yeah, we’re all very deanonymizable (although I’d expect that level of adversary to be using specialized tools rather than a bog-standard LLM).
        For an adversary with the resources of a local police agency, I could imagine them acquiring and feeding in emails & posts from someone in particular if that person has already been promoted to their attention, and thereby deanonymizing them.
        For an adversary with the resources of a local police agency, I’d expect most of us to be non-identifiable if we haven’t been promoted to particular attention.
        And for an adversary with the resources of a typical company or independent researcher, I’d expect must of us to be non-identifiable even if we have been promoted to particular attention.
        It’s not something I’ve tried to analyze or research in depth, that’s just my current impressions. Quite open to being shown I’m wrong about one or more of those threat models.
    - eggsyntax 19 May 2024 17:43 UTC
      1 point
      0
      Parent
      Will read this in detail later when I can, but on first skim—I’ve seen you draw that conclusion in earlier comments. Are you assuming you yourself will finally be deanonymized soon? No pressure to answer, of course; it’s a pretty personal question, and answering might itself give away a bit or two.
      - gwern 19 May 2024 20:11 UTC
        13 points
        8
        Parent
        I can be deanonymized in other ways more easily.
        
        I write these as warnings to other people who might think that it is still adequate to simply use a pseudonym and write exclusively in text and not make the obvious OPSEC mistakes, and so you can safely write under multiple names. It is not, because you will have already lost in a few years.
        
        Regrettable as it is, if you wish to write anything online which might invite persecution over the next few years or lead activists/newspapers-of-record to try to dox you—if you are, say, blowing a whistle at a sophisticated megacorp company with the most punitive NDAs & equity policies in the industry—you would be well-advised to start laundering your writings through an LLM yesterday, despite the deplorable effects on style. Truesight will only get keener and flense away more of the security by obscurity we so take for granted, because “attacks only get better”.
        kromem 21 May 2024 12:05 UTC
        1 point
        −3
        Parent
        I wouldn’t be surprised if within a few years the specific uniqueness of individual users of models today will be able to be identified from effectively prompt reflection in the outputs for any non-trivial/simplistic prompts by models of tomorrow.
        
        For example, I’d be willing to bet I could spot the Claude outputs from janus vs most other users, and I’m not a quasi-magical correlation machine that’s exponentially getting better.
        
        A bit like how everyone assumed Bitcoin used with tumblers was ‘untraceable’ until it turned out it wasn’t.
        
        Anonymity is very likely dead for any long storage outputs no matter the techniques being used, it just isn’t widely realized yet.
        eggsyntax 20 May 2024 16:37 UTC
        1 point
        0
        Parent
        Thanks! Doomed though it may be (and I’m in full agreement that it is), here’s hoping that your and everyone else’s pseudonymity lasts as long as possible.