(8) won’t be attempted, or will fail at some combination of design, manufacture, or just-being-pickable. This is a great proposal and a beautifully compact crux for the overall approach.
I agree with you that this feels like a ‘compact crux’ for many parts of the agenda. I’d like to take your bet, let me reflect if there’s any additional operationalizations or conditioning.
However, I believe that the path there is to extend and complement current techniques, including empirical and experimental approaches alongside formal verification—whatever actually works in practice.
FWIW in Towards Guaranteed Safe AI I we endorse this: “Moreover, while we have argued for the need for verifiable quantitative safety guarantees, it is important to note that GS AI may not be the only route to achieving such guarantees. An alternative approach might be to extract interpretable policies from black-box algorithms via automated mechanistic interpretability… it is ultimately an empirical question whether it is easier to create interpretable world models or interpretable policies in a given domain of operation.”
I agree with you that this feels like a ‘compact crux’ for many parts of the agenda. I’d like to take your bet, let me reflect if there’s any additional operationalizations or conditioning.
quick proposals:
I win at the end of 2026, if there has not been a formally-verified design for a mechanical lock, OR the design does not verify it cannot be mechanically picked, OR less than three consistent physical instances have been manufactured. (e.g. a total of three including prototypes or other designs doesn’t count)
You win if at the end of 2027, there have been credible and failed expert attempts to pick such a lock (e.g. an open challenge at Defcon). I win if there is a successful attempt.
Bet resolves neutral, and we each donate half our stakes to a mutally-agreed charity, if it’s unclear whether production actually happened, or there were no credible attempts to pick a verified lock.
Any disputes resolved by the best judgement of an agreed-in-advance arbiter; I’d be happy with the LessWrong team if you and they also agree.
Unsolicited suggestion: it is probably useful for y’all to define further what “pick a lock”means—e.g., if someone builds a custom defeat device of some sort, that does some sort of activity that is non-destructive but engages in a mechanical operation very surprising to someone thinking of traditional lock-picking methods—does that count?
(I think you’d probably say yes, so long as the device isn’t, e.g., a robot arm that’s nondestructively grabbing the master key for the lock out of Zac’s pocket and inserting it into the lock, but some sort of definining-in-advance would likely help.)
Nonetheless, think this would be awesome as an open challenge at Defcon (I suspect you can convince them to Black Badge the challenge...)
it is probably useful for y’all to define further what “pick a lock”means
Well, a lot of the difficulty of any kind of formal proof is indeed that any specification you come up with will have holes you didn’t anticipate. As such, coming up with a solid definition of what “pick a lock” means is a large part of the difficulty of this formal verification endeavor, and as such I don’t think it makes sense to try to get that out of the way before the bet is even made. I think deferring to a trusted third arbiter whether the definition chosen for the proof is indeed an adequate definition would be a better choice.
I mean, I think it’s worth doing an initial loose and qualitative discussion to make sure that you’re thinking about overlapping spaces conceptually. Otherwise, not worth the more detailed effort.
This seems mostly good to me, thank you for the proposals (and sorry for my delayed response, this slipped my mind).
OR less than three consistent physical instances have been manufactured. (e.g. a total of three including prototypes or other designs doesn’t count)
Why this condition? It doesn’t seem relevant to the core contention, and if someone prototyped a single lock using a GS AI approach but didn’t figure out how to manufacture it at scale, I’d still consider it to have been an important experiment.
I don’t think that a thing you can only manufacture once is a practically usable lock; having multiple is also practically useful to facilitate picking attempts and in case of damage—imagine that a few hours into an open pick-this-lock challenge, someone bent a part such that the key no longer opens the lock. I’d suggest resolving neutral in this case as we only saw an partial attempt.
Other conditions:
I think it’s important that the design could have at least a thousand distinct keys which are non-pickable. It’s fine if the theoretical keyspace is larger so long as the verified-secure keyspace is large enough to be useful, and distinct keys/locks need not be manufactured so long as they’re clearly possible.
I expect the design to be available in advance to people attempting to pick the lock, just as the design principles and detailed schematics of current mechanical locks are widely known—security through obscurity would not demonstrate that the design is better, only that as-yet-secret designs are harder to exploit.
I nominate @raemon as our arbiter, if both he and you are willing, and the majority vote or nominee of the Lightcone team if Raemon is unavailable for some reason (and @habryka approves that).
(note: for @mentioning to work, you need to be in the LessWrong Docs editor, or in markdown you actually type out [@Raemon](https://www.lesswrong.com/users/raemon?mention=user) (the “@” in the “@Raemon” doesn’t actually do anything, the important part is that the url has mention=user. We should probably try to make this work more intuitively in markdown but it’s not quite obvious how to do it.)
I think in practice my take here would probably be something like a deferral to the Lightcone Team majority vote, since I’m not very informed on this field, but I’m happy to own the metacognition of making sure that happens and sanity checking the results.
I agree with you that this feels like a ‘compact crux’ for many parts of the agenda. I’d like to take your bet, let me reflect if there’s any additional operationalizations or conditioning.
FWIW in Towards Guaranteed Safe AI I we endorse this: “Moreover, while we have argued for the need for verifiable quantitative safety guarantees, it is important to note that GS AI may not be the only route to achieving such guarantees. An alternative approach might be to extract interpretable
policies from black-box algorithms via automated mechanistic interpretability… it is ultimately an empirical question whether it is easier to create interpretable world models or interpretable policies in a given domain of operation.”
quick proposals:
I win at the end of 2026, if there has not been a formally-verified design for a mechanical lock, OR the design does not verify it cannot be mechanically picked, OR less than three consistent physical instances have been manufactured. (e.g. a total of three including prototypes or other designs doesn’t count)
You win if at the end of 2027, there have been credible and failed expert attempts to pick such a lock (e.g. an open challenge at Defcon). I win if there is a successful attempt.
Bet resolves neutral, and we each donate half our stakes to a mutally-agreed charity, if it’s unclear whether production actually happened, or there were no credible attempts to pick a verified lock.
Any disputes resolved by the best judgement of an agreed-in-advance arbiter; I’d be happy with the LessWrong team if you and they also agree.
Unsolicited suggestion: it is probably useful for y’all to define further what “pick a lock”means—e.g., if someone builds a custom defeat device of some sort, that does some sort of activity that is non-destructive but engages in a mechanical operation very surprising to someone thinking of traditional lock-picking methods—does that count?
(I think you’d probably say yes, so long as the device isn’t, e.g., a robot arm that’s nondestructively grabbing the master key for the lock out of Zac’s pocket and inserting it into the lock, but some sort of definining-in-advance would likely help.)
Nonetheless, think this would be awesome as an open challenge at Defcon (I suspect you can convince them to Black Badge the challenge...)
Well, a lot of the difficulty of any kind of formal proof is indeed that any specification you come up with will have holes you didn’t anticipate. As such, coming up with a solid definition of what “pick a lock” means is a large part of the difficulty of this formal verification endeavor, and as such I don’t think it makes sense to try to get that out of the way before the bet is even made. I think deferring to a trusted third arbiter whether the definition chosen for the proof is indeed an adequate definition would be a better choice.
I mean, I think it’s worth doing an initial loose and qualitative discussion to make sure that you’re thinking about overlapping spaces conceptually. Otherwise, not worth the more detailed effort.
This seems mostly good to me, thank you for the proposals (and sorry for my delayed response, this slipped my mind).
Why this condition? It doesn’t seem relevant to the core contention, and if someone prototyped a single lock using a GS AI approach but didn’t figure out how to manufacture it at scale, I’d still consider it to have been an important experiment.
Besides that, I’d agree to the above conditions!
I don’t think that a thing you can only manufacture once is a practically usable lock; having multiple is also practically useful to facilitate picking attempts and in case of damage—imagine that a few hours into an open pick-this-lock challenge, someone bent a part such that the key no longer opens the lock. I’d suggest resolving neutral in this case as we only saw an partial attempt.
Other conditions:
I think it’s important that the design could have at least a thousand distinct keys which are non-pickable. It’s fine if the theoretical keyspace is larger so long as the verified-secure keyspace is large enough to be useful, and distinct keys/locks need not be manufactured so long as they’re clearly possible.
I expect the design to be available in advance to people attempting to pick the lock, just as the design principles and detailed schematics of current mechanical locks are widely known—security through obscurity would not demonstrate that the design is better, only that as-yet-secret designs are harder to exploit.
I nominate @raemon as our arbiter, if both he and you are willing, and the majority vote or nominee of the Lightcone team if Raemon is unavailable for some reason (and @habryka approves that).
(note: for @mentioning to work, you need to be in the LessWrong Docs editor, or in markdown you actually type out
[@Raemon](https://www.lesswrong.com/users/raemon?mention=user
) (the “@” in the “@Raemon” doesn’t actually do anything, the important part is that the url hasmention=user
. We should probably try to make this work more intuitively in markdown but it’s not quite obvious how to do it.)I think in practice my take here would probably be something like a deferral to the Lightcone Team majority vote, since I’m not very informed on this field, but I’m happy to own the metacognition of making sure that happens and sanity checking the results.
That works for me—thanks very much for helping out!
@Raemon works for me; and I agree with the other conditions.
I think we’re agreed then, if you want to confirm the size? Then we wait for 2027!
Given your rationale I’m onboard for 3 or more consistent physical instances of the lock have been manufactured.
Lets ‘lock’ it in.
Nice! I look forward to seeing how this resolves.
Ah, by ‘size’ I meant the stakes, not the number of locks—did you want to bet the maximum $1k against my $10k, or some smaller proportional amount?
Ah gotcha, yes lets do my $1k against your $10k.
Locked in! Whichever way this goes, I expect to feel pretty good about both the process and the outcome :-)