Gerald Monroe comments on Preventing model exfiltration with upload limits

Gerald Monroe 6 Feb 2024 17:20 UTC
2 points
1

I think limiting attack surface using various types of isolation seems promsing, but orthogonal to the intervention we’re describing in this post.

The issue is that for a model to work it needs a lot of I/O bandwidth. You’re thinking ok, if all it needs to do is emit output tokens to a user window, then if those tokens are small in size over time relative to the model weights, you can restrict uploads.

But consider how many parallel sessions there are now (millions per month) or will be in the future. Leaks could be coordinated or randomized to parts of the weights so each parallel session leaks a little more.

Everyone is using a common golden model as well, and market dynamics want in the future there to be just a few models everyone’s using.

Suppose there are 10 million user sessions a month and a leak of 128 bytes per session. Model is 0.64 terrabytes. (Gpt-4 assuming it’s distilled 10x for the T model). Then it takes 550 months to leak.

If there’s 1 billion sessions a month it leaks in 5.5 months.

It’s better than nothing.
What links here?
- ryan_greenblatt's comment on Preventing model exfiltration with upload limits by ryan_greenblatt (6 Feb 2024 17:23 UTC; 4 points)
- ryan_greenblatt 6 Feb 2024 17:23 UTC
  4 points
  0
  Parent
  See this section of the post for commentary.
  
  TLDR: we actually think it’s reasonably likely that the total data outflow is of comparable scale to model weights (for an AI lab’s most capable model) under normal commercial operation.
  
  Thinking these are of comparable scale is a core assumption for the upload limiting method to make sense. (Without the introduction of some more exotic approaches.)
- ryan_greenblatt 6 Feb 2024 17:47 UTC
  3 points
  0
  Parent
  Also, I added a BOTEC about viability for GPT-6 showing that it naively seems like we could be fine with GPT-6 generating an entire book for every human on earth.