Zac Hatfield-Dodds comments on Provably Safe AI: Worldview and Projects

Zac Hatfield-Dodds 17 Aug 2024 18:43 UTC
6 points
2
I agree with you that this feels like a ‘compact crux’ for many parts of the agenda. I’d like to take your bet, let me reflect if there’s any additional operationalizations or conditioning.

quick proposals:
- I win at the end of 2026, if there has not been a formally-verified design for a mechanical lock, OR the design does not verify it cannot be mechanically picked, OR less than three consistent physical instances have been manufactured. (e.g. a total of three including prototypes or other designs doesn’t count)
- You win if at the end of 2027, there have been credible and failed expert attempts to pick such a lock (e.g. an open challenge at Defcon). I win if there is a successful attempt.
- Bet resolves neutral, and we each donate half our stakes to a mutally-agreed charity, if it’s unclear whether production actually happened, or there were no credible attempts to pick a verified lock.
- Any disputes resolved by the best judgement of an agreed-in-advance arbiter; I’d be happy with the LessWrong team if you and they also agree.
- davekasten 17 Aug 2024 21:58 UTC
  3 points
  2
  Parent
  Unsolicited suggestion: it is probably useful for y’all to define further what “pick a lock”means—e.g., if someone builds a custom defeat device of some sort, that does some sort of activity that is non-destructive but engages in a mechanical operation very surprising to someone thinking of traditional lock-picking methods—does that count?
  
  (I think you’d probably say yes, so long as the device isn’t, e.g., a robot arm that’s nondestructively grabbing the master key for the lock out of Zac’s pocket and inserting it into the lock, but some sort of definining-in-advance would likely help.)
  
  Nonetheless, think this would be awesome as an open challenge at Defcon (I suspect you can convince them to Black Badge the challenge...)
  - habryka 17 Aug 2024 22:27 UTC
    4 points
    0
    Parent
    it is probably useful for y’all to define further what “pick a lock”means
    Well, a lot of the difficulty of any kind of formal proof is indeed that any specification you come up with will have holes you didn’t anticipate. As such, coming up with a solid definition of what “pick a lock” means is a large part of the difficulty of this formal verification endeavor, and as such I don’t think it makes sense to try to get that out of the way before the bet is even made. I think deferring to a trusted third arbiter whether the definition chosen for the proof is indeed an adequate definition would be a better choice.
    - davekasten 18 Aug 2024 0:26 UTC
      1 point
      0
      Parent
      I mean, I think it’s worth doing an initial loose and qualitative discussion to make sure that you’re thinking about overlapping spaces conceptually. Otherwise, not worth the more detailed effort.
- Ben Goldhaber 19 Aug 2024 4:12 UTC
  1 point
  0
  Parent
  This seems mostly good to me, thank you for the proposals (and sorry for my delayed response, this slipped my mind).
  OR less than three consistent physical instances have been manufactured. (e.g. a total of three including prototypes or other designs doesn’t count)
  Why this condition? It doesn’t seem relevant to the core contention, and if someone prototyped a single lock using a GS AI approach but didn’t figure out how to manufacture it at scale, I’d still consider it to have been an important experiment.
  Besides that, I’d agree to the above conditions!
  - Zac Hatfield-Dodds 21 Aug 2024 0:36 UTC
    6 points
    5
    Parent
    I don’t think that a thing you can only manufacture once is a practically usable lock; having multiple is also practically useful to facilitate picking attempts and in case of damage—imagine that a few hours into an open pick-this-lock challenge, someone bent a part such that the key no longer opens the lock. I’d suggest resolving neutral in this case as we only saw an partial attempt.
    
    Other conditions:
    
    I think it’s important that the design could have at least a thousand distinct keys which are non-pickable. It’s fine if the theoretical keyspace is larger so long as the verified-secure keyspace is large enough to be useful, and distinct keys/locks need not be manufactured so long as they’re clearly possible.
    I expect the design to be available in advance to people attempting to pick the lock, just as the design principles and detailed schematics of current mechanical locks are widely known—security through obscurity would not demonstrate that the design is better, only that as-yet-secret designs are harder to exploit.
    
    I nominate @raemon as our arbiter, if both he and you are willing, and the majority vote or nominee of the Lightcone team if Raemon is unavailable for some reason (and @habryka approves that).
    What links here?
    Noosphere89's comment on The Hopium Wars: the AGI Entente Delusion by Max Tegmark (14 Oct 2024 1:51 UTC; 6 points)
    - Raemon 22 Aug 2024 0:37 UTC
      2 points
      0
      Parent
      (note: for @mentioning to work, you need to be in the LessWrong Docs editor, or in markdown you actually type out [@Raemon](https://www.lesswrong.com/users/raemon?mention=user) (the “@” in the “@Raemon” doesn’t actually do anything, the important part is that the url has mention=user. We should probably try to make this work more intuitively in markdown but it’s not quite obvious how to do it.)
      I think in practice my take here would probably be something like a deferral to the Lightcone Team majority vote, since I’m not very informed on this field, but I’m happy to own the metacognition of making sure that happens and sanity checking the results.
      - Zac Hatfield-Dodds 22 Aug 2024 2:56 UTC
        2 points
        0
        Parent
        That works for me—thanks very much for helping out!
    - Ben Goldhaber 21 Aug 2024 23:50 UTC
      1 point
      0
      Parent
      @Raemon works for me; and I agree with the other conditions.
      What links here?
      Noosphere89's comment on The Hopium Wars: the AGI Entente Delusion by Max Tegmark (14 Oct 2024 1:51 UTC; 6 points)
      - Zac Hatfield-Dodds 22 Aug 2024 3:47 UTC
        4 points
        0
        Parent
        I think we’re agreed then, if you want to confirm the size? Then we wait for 2027!
        Ben Goldhaber 3 Sep 2024 17:14 UTC
        1 point
        0
        Parent
        Given your rationale I’m onboard for 3 or more consistent physical instances of the lock have been manufactured.
        
        Lets ‘lock’ it in.
        Ben Goldhaber 1 Apr 2025 20:34 UTC
        5 points
        2
        Parent
        fyi @Zac Hatfield-Dodds my probability has fallen below 10% - I expected at least one relevant physical<>cyber project to have started in the past six months, since it hasn’t I doubt this will make the timeline. While not conceding (because I’m still unsure how far AI uplift alone gets us), seems right to note the update.
        Zac Hatfield-Dodds 4 Sep 2024 9:56 UTC
        2 points
        0
        Parent
        Nice! I look forward to seeing how this resolves.
        
        Ah, by ‘size’ I meant the stakes, not the number of locks—did you want to bet the maximum $1k against my $10k, or some smaller proportional amount?
        Ben Goldhaber 4 Sep 2024 17:40 UTC
        1 point
        0
        Parent
        Ah gotcha, yes lets do my $1k against your $10k.
        Zac Hatfield-Dodds 4 Sep 2024 18:22 UTC
        3 points
        1
        Parent
        Locked in! Whichever way this goes, I expect to feel pretty good about both the process and the outcome :-)