habryka comments on tlevin’s Shortform

habryka Aug 29, 2024, 11:09 PM
7 points
0
It’s really hard to know that your other party is giving you API access to their most powerful model. If you could somehow verify that the API you are accessing is indeed directly hooked up to their most powerful model, and that the capabilities of that model aren’t being intentionally hobbled to deceive you, then I do think this gets you a lot of the same benefit.
Some of the benefit is still missing though. I think lack of moats is a strong disincentive to develop technology, and so in a race scenario you might be a lot less tempted to make a mad sprint towards AGI if you think your opponents can catch up almost immediately, and so you might end up with substantial timeline-accelerating effects by enabling better moats.
I do think the lack-of-moat benefit is smaller than the verification benefit.
- ryan_greenblatt Aug 30, 2024, 1:29 AM
  4 points
  0
  Parent
  I think it should be possible to get a good enough verification regime in practice with considerable effort. It’s possible that sufficiently good verification occurs by default due to spies.
  
  I agree it there will potentially be a lot of issues downstream of verification issues by default.
  
  I think lack of moats is a strong disincentive to develop technology, and so in a race scenario you might be a lot less tempted to make a mad sprint towards AGI if you think your opponents can catch up almost immediately
  
  Hmm, this isn’t really how I model the situation with respect to racing. From my perspective, the question isn’t “security or no security”, but is instead “when will you have extreme security”.
  
  (My response might overlap with tlevin’s, I’m not super sure.)
  
  Here’s an example way things could go:
  - An AI lab develops a model that begins to accelerate AI R&D substantially (say 10x) while having weak security. This model was developed primarily for commercial reasons and the possibility of it being stolen isn’t a substantial disincentive in practice.
  - This model is immediately stolen by China.
  - Shortly after this, USG secures the AI lab.
  - Now, further AIs will be secure, but to stay ahead of China which has substantially accelerated AI R&D and other AI work, USG races to AIs which are much smarter than humans.
  In this scenario, if you had extreme security ready to go earlier, then the US would potentially have a larger lead and better negotiating position. I think this probably gets you longer delays prior to qualitatively wildly superhuman AIs in practice.
  
  There is a case that if you don’t work on extreme security in advance, then there will naturally be a pause to implement this. I’m a bit skeptical of this in practice, especially in short timelines. I also think that the timing of this pause might not be ideal—you’d like to pause when you already have transformative AI rather than before.
  
  Separately, if you imagine that USG is rational and at least somewhat aligned, then I think security looks quite good, though I can understand why you wouldn’t buy this.
  - habryka Aug 30, 2024, 1:46 AM
    7 points
    11
    Parent
    Hmm, this isn’t really how I model the situation with respect to racing. From my perspective, the question isn’t “security or no security”
    Interesting, I guess my model is that the default outcome (in the absence of heroic efforts to the contrary) is indeed “no security for nation state attackers”, which as far as I can tell is currently the default for practically everything that is developed using modern computing systems. Getting to a point where you can protect something like the weights of an AI model from nation state actors would be extraordinarily difficult and an unprecedented achievement in computer security, which is why I don’t expect it to happen (even as many actors would really want it to happen).
    My model of cybersecurity is extremely offense-dominated for anything that requires internet access or requires thousands of people to have access (both of which I think are quite likely for deployed weights).
- tlevin Aug 29, 2024, 11:37 PM
  3 points
  0
  Parent
  The “how do we know if this is the most powerful model” issue is one reason I’m excited by OpenMined, who I think are working on this among other features of external access tools
  - habryka Aug 29, 2024, 11:53 PM
    2 points
    0
    Parent
    Interesting. I would have to think harder about whether this is a tractable problem. My gut says it’s pretty hard to build confidence here without leaking information, but I might be wrong.