Gerald Monroe comments on Preventing model exfiltration with upload limits

Gerald Monroe 6 Feb 2024 16:52 UTC
2 points
0
Did you consider using hypervisors with guest OSes or VMs on a host OS?

I don’t work on cybersecurity directly either but have dealt with both of these from the perspective of “how do I make it work with this security measure present”. I also write the software that loads the weights to neural network accelerators and have recently worked on LLM support.

The concept either way is that a guest OS or VM have an accessible address range of memory that is mapped to them. Every memory address they see is fake, the CPU keeps a mapping table that maps the [virtual address] to the [real address]. There can be multiple levels of mapping.

The other limitation is the control bits or control opcodes for the processor itself cannot be accessed by a guest.

What this means in principal is that it doesn’t matter what a guest program does, no command will access memory you don’t have privileges for, and you can’t give yourself those privileges.

You would keep the program that actually can see the files with the model weights, that owns the GPU or TPU driver, on the host OS or a privileged guest OS.

On a separate OS inside another guest or VM is the python stack that actually calls the apis to run the model.

Between the 2 OS there are shared memory buffers. The input token buffer gets written to a circular buffer mapped to one of these buffers, then the privileged software emits the GPU/TPU driver API calls to DMA transfer the tokens to the hardware. Same for the output path.

There’s a lot of moving parts and one common security vulnerability is to try to exploit flaws in the underlying implementation such as nearby memory cells in the actual ddr.

Note that if there is a leak, it would probably be slow. Some hack like rowhammer or a similar attack that leaks bits from the CPU cache, leaks a small amount of model weights, and this gets sent somewhere and stored.

If you wanted to reduce leaks one approach would be to stop using software isolation and use hardware. Use a separate CPU on the same motherboard to host the privileged OS.
- Tao Lin 6 Feb 2024 20:37 UTC
  5 points
  3
  Parent
  A core advantage of bandwidth limiting over other cybersec interventions is its a simple system we can make stronger arguments about, implemented on a simple processor, without the complexity and uncertainty of modern processors and OSes
- ryan_greenblatt 6 Feb 2024 17:06 UTC
  3 points
  0
  Parent
  I think limiting attack surface using various types of isolation seems promsing, but orthogonal to the intervention we’re describing in this post.
  
  (As far as isolation proposals (which aren’t discussed in this post), I more often imagine a datacenter where we have a large number of inference machines which aren’t directly connected to the internet, but instead only connect to a local area network. Then, we can have a single extremely basic machine (e.g. embedded device/FPGA which doesn’t even have an OS) implement an extremely limited API to connect to the outside world to service users. This API could only implement a small number of primitives (e.g. inference, fine-tuning). This reduces the internet facing attack surface at the cost of some convenience. However, note that supply chain attacks or hardware attacks on the inference machines are still possible and attacks could communicate with these compromised machines with encoded messages via the API.)
  What links here?
  - ryan_greenblatt's comment on Preventing model exfiltration with upload limits by ryan_greenblatt (6 Feb 2024 17:46 UTC; 2 points)
  - Gerald Monroe 6 Feb 2024 17:32 UTC
    5 points
    0
    Parent
    
    Then, we can have a single extremely basic machine (e.g. embedded device/FPGA which doesn’t even have an OS) implement an extremely limited API to connect to the outside world to service an API. This reduces the internet facing attack surface at the cost of some convenience.
    
    This is an extremely good idea and you can see the physical evidence all around you of our predecessors solving a similar problem.
    
    Have you ever noticed how every electrical panel is in a metal box, and every high power appliance is in a metal case? Every building has a gap between it and neighbors?
    
    Its the same concept applied. Fire is just 3 ingredients but humans don’t fully understand the plasma dynamics. What we do know is you can’t take any chances, and you must subdivide the world with fire breaks and barriers to contain the most likely sources of ignition.
    
    A world that keeps AI from burning out of control is one where these hardware ASICs—firewalls—guard the network interfaces for every cluster capable of hosting an AI model. This reduces how much coordination models can do with each other, and where they can escape to.
    
    You don’t just want to restrict API you want to explicitly define what systems a hosted model can communicate with. Ideally a specific session can only reach a paid user, any systems the user has mapped to it, and it does research via a cached copy of the internet not a global one, so the AI models cannot coordinate with each other. (So these are more than firewalls and have functions similar to VPN gateways)
    
    There would have to be a “fire code” for AI.
  - Gerald Monroe 6 Feb 2024 17:20 UTC
    2 points
    1
    Parent
    
    I think limiting attack surface using various types of isolation seems promsing, but orthogonal to the intervention we’re describing in this post.
    
    The issue is that for a model to work it needs a lot of I/O bandwidth. You’re thinking ok, if all it needs to do is emit output tokens to a user window, then if those tokens are small in size over time relative to the model weights, you can restrict uploads.
    
    But consider how many parallel sessions there are now (millions per month) or will be in the future. Leaks could be coordinated or randomized to parts of the weights so each parallel session leaks a little more.
    
    Everyone is using a common golden model as well, and market dynamics want in the future there to be just a few models everyone’s using.
    
    Suppose there are 10 million user sessions a month and a leak of 128 bytes per session. Model is 0.64 terrabytes. (Gpt-4 assuming it’s distilled 10x for the T model). Then it takes 550 months to leak.
    
    If there’s 1 billion sessions a month it leaks in 5.5 months.
    
    It’s better than nothing.
    What links here?
    ryan_greenblatt's comment on Preventing model exfiltration with upload limits by ryan_greenblatt (6 Feb 2024 17:23 UTC; 4 points)
    - ryan_greenblatt 6 Feb 2024 17:23 UTC
      4 points
      0
      Parent
      See this section of the post for commentary.
      
      TLDR: we actually think it’s reasonably likely that the total data outflow is of comparable scale to model weights (for an AI lab’s most capable model) under normal commercial operation.
      
      Thinking these are of comparable scale is a core assumption for the upload limiting method to make sense. (Without the introduction of some more exotic approaches.)
    - ryan_greenblatt 6 Feb 2024 17:47 UTC
      3 points
      0
      Parent
      Also, I added a BOTEC about viability for GPT-6 showing that it naively seems like we could be fine with GPT-6 generating an entire book for every human on earth.