ryan_greenblatt comments on Preventing model exfiltration with upload limits

ryan_greenblatt Feb 6, 2024, 5:06 PM
3 points
0
I think limiting attack surface using various types of isolation seems promsing, but orthogonal to the intervention we’re describing in this post.

(As far as isolation proposals (which aren’t discussed in this post), I more often imagine a datacenter where we have a large number of inference machines which aren’t directly connected to the internet, but instead only connect to a local area network. Then, we can have a single extremely basic machine (e.g. embedded device/FPGA which doesn’t even have an OS) implement an extremely limited API to connect to the outside world to service users. This API could only implement a small number of primitives (e.g. inference, fine-tuning). This reduces the internet facing attack surface at the cost of some convenience. However, note that supply chain attacks or hardware attacks on the inference machines are still possible and attacks could communicate with these compromised machines with encoded messages via the API.)
What links here?
- ryan_greenblatt's comment on Preventing model exfiltration with upload limits by ryan_greenblatt (Feb 6, 2024, 5:46 PM; 2 points)
- Gerald Monroe Feb 6, 2024, 5:32 PM
  5 points
  0
  Parent
  
  Then, we can have a single extremely basic machine (e.g. embedded device/FPGA which doesn’t even have an OS) implement an extremely limited API to connect to the outside world to service an API. This reduces the internet facing attack surface at the cost of some convenience.
  
  This is an extremely good idea and you can see the physical evidence all around you of our predecessors solving a similar problem.
  
  Have you ever noticed how every electrical panel is in a metal box, and every high power appliance is in a metal case? Every building has a gap between it and neighbors?
  
  Its the same concept applied. Fire is just 3 ingredients but humans don’t fully understand the plasma dynamics. What we do know is you can’t take any chances, and you must subdivide the world with fire breaks and barriers to contain the most likely sources of ignition.
  
  A world that keeps AI from burning out of control is one where these hardware ASICs—firewalls—guard the network interfaces for every cluster capable of hosting an AI model. This reduces how much coordination models can do with each other, and where they can escape to.
  
  You don’t just want to restrict API you want to explicitly define what systems a hosted model can communicate with. Ideally a specific session can only reach a paid user, any systems the user has mapped to it, and it does research via a cached copy of the internet not a global one, so the AI models cannot coordinate with each other. (So these are more than firewalls and have functions similar to VPN gateways)
  
  There would have to be a “fire code” for AI.
- Gerald Monroe Feb 6, 2024, 5:20 PM
  2 points
  1
  Parent
  
  I think limiting attack surface using various types of isolation seems promsing, but orthogonal to the intervention we’re describing in this post.
  
  The issue is that for a model to work it needs a lot of I/O bandwidth. You’re thinking ok, if all it needs to do is emit output tokens to a user window, then if those tokens are small in size over time relative to the model weights, you can restrict uploads.
  
  But consider how many parallel sessions there are now (millions per month) or will be in the future. Leaks could be coordinated or randomized to parts of the weights so each parallel session leaks a little more.
  
  Everyone is using a common golden model as well, and market dynamics want in the future there to be just a few models everyone’s using.
  
  Suppose there are 10 million user sessions a month and a leak of 128 bytes per session. Model is 0.64 terrabytes. (Gpt-4 assuming it’s distilled 10x for the T model). Then it takes 550 months to leak.
  
  If there’s 1 billion sessions a month it leaks in 5.5 months.
  
  It’s better than nothing.
  What links here?
  - ryan_greenblatt's comment on Preventing model exfiltration with upload limits by ryan_greenblatt (Feb 6, 2024, 5:23 PM; 4 points)
  - ryan_greenblatt Feb 6, 2024, 5:23 PM
    4 points
    0
    Parent
    See this section of the post for commentary.
    
    TLDR: we actually think it’s reasonably likely that the total data outflow is of comparable scale to model weights (for an AI lab’s most capable model) under normal commercial operation.
    
    Thinking these are of comparable scale is a core assumption for the upload limiting method to make sense. (Without the introduction of some more exotic approaches.)
  - ryan_greenblatt Feb 6, 2024, 5:47 PM
    3 points
    0
    Parent
    Also, I added a BOTEC about viability for GPT-6 showing that it naively seems like we could be fine with GPT-6 generating an entire book for every human on earth.