From my experience with this kind of thing, I would assume if I did a complete series of tests with the ramp and cup in a lab, verified a 100⁄100 capture rate, and measures everything to the millimeter....just moving the setup to where the demo is held and having 1 try is probably enough to have at least a 10 percent chance of a miss.
By Murphy’s law that 10 percent is more like 95 percent.
If we really expect to launch AIs with this kind of ability to misbehave we should just carve our tombstones.
So, there’s this thing that I think often happens where someone tries to do X a bunch of times, and fails, and sees a bunch of people around them try to do X and fail, and eventually learns the lesson that X basically just doesn’t work in practice. But in fact X can work in practice, it just requires a skillset which (a) is rare, and (b) the people in question did not have.
I think “get technical thing right on the first try” is an X for which this story is reasonably common.
One intended takeaway of the workshop is that, while the very large majority of people do fail (Alison’s case is roughly the median), a nonzero fraction succeed, and not just by sheer luck either. They succeed by tackling the problem in a different way.
I agree, for instance, that running a series of tests with the ramp and cup in a lab, and then just moving the setup to where the demo is held, is probably enough to have at least a 10 percent chance of a miss. Someone with a Robert-like mindset would not just rely on the lab results directly generalizing to the demo environment; they’d re-test parts of the system in the demo environment (without doing a full end-to-end test).
Also, relevant side-fact: David looked it up while we were writing this post, and we’re pretty sure the Manhattan Project’s nuke worked on their first full live test.
Twice. The Hiroshima device was untested with live cores, they must have tested everything else for the “gun” but load it with live u-335. The Trinity test was to test plutonium implosion which they were concerned about since it requires precise timing and one bad detonator will cause the implosion to not happen. So that was also their first live test.
On the other hand, the castle bravo test was one rather close to what we are afraid of for AI safety. It was meant to be 6 megatons and had 2.5 times more yield, because the rules for fusion involving lithium-7 allowed far more bang than expected. It would be analogous to retraining an AGI with a “few minor tweaks” to the underlying network architectures and getting dangerous superintelligence instead of a slight improvement on the last model.
They would have needed to recreate fusion conditions which in 1953 was mostly available only inside a detonating nuclear device. The national ignition laboratory is an example of the kind of equipment you need to research fusion if you don’t want to detonate a nuke.
The Trinity test was preceded by a full test with the Pu replaced by some other material. The inert test was designed to test whether they were getting the needed compression. (My impression is this was not publicly known until relatively recently)
I know they did many tries for the implosion mechanism. Didn’t know they did a full “dress rehearsal” where it sounds like they had every component including the casing present. Smart.
My point is there was still at least a 10 percent chance of failure even if you do all that. So many variables, just 1 dress rehearsal test is inadequate. You would almost have to have robots make several hundred complete devices, test the implosion on them all, to improve your odds. (And even today robots are incapable of building something this complex)
The comparison between the calculations saying igniting the atmosphere was impossible and the catastrophic mistake on Castle Bravo is apposite as the initial calculations for both were done by the same people at the same gathering!
I likewise thought the post would consist of people trying increasingly more sophisticated approaches and always failing because of messy implementational details.
In e. g. lab experiments, you get to control the experimental setup and painstakingly optimize it to conform to whatever idealized conditions your equations are adapted for. Similar is often done in industry: we often try to screen away the messiness, either by transforming the environments our technologies are deployed in (roads for cars), or by making the technology’s performance ignore rather than adapt to the messiness (planes ignoring the ground conditions entirely). I expected the point of the exercise to be showing what it looks like when you’re exposed to reality’s raw messiness unprotected, even in an experimental setup as conceptually simple and well-understood as that.
And with “do it on the first try” on top...
But it sounds like there was a non-negligible success rate? That’s a positive surprise for me.
(Although I guess Robert’s trick is kind of “screening away the messiness”, in that he gets to ignore the ramp’s complicated mechanics and just grab the only bit of data he needs. Kinda interested what the actual success rate on this workshop was and what strategies the winners tried. @johnswentworth?)
Kinda interested what the actual success rate on this workshop was and what strategies the winners tried.
Success rate is ~5-15%. Half of that is people who basically get lucky—the most notable such occasion was someone who did the simplest possible calculation, but dropped a factor of 2 at one point, and that just happened to work perfectly with that day’s ramp setup.
Estimating the ball’s speed from video is the main predictor of success; people who’ve done that have something like a 50% success rate (n=4 IIRC). So people do still fail using that approach—for instance, I had one group take the speed they estimated from the video, and the speed they estimated from the energy calculation, and average them together, basically as a compromise between two people within the group. Another had the general right idea but just didn’t execute very well.
Notably, the ball does consistently land in the same spot, so if one executes the right strategy well then basically-zero luck is required.
I expected the point of the exercise to be showing what it looks like when you’re exposed to reality’s raw messiness unprotected, even in an experimental setup as conceptually simple and well-understood as that.
for instance, I had one group take the speed they estimated from the video, and the speed they estimated from the energy calculation, and average them together, basically as a compromise between two people within the group
If you only care about betting odds, then feel free to average together mutually incompatible distributions reflecting mutually exclusive world-models. If you care about planning then you actually have to decide which model is right or else plan carefully for either outcome.
Kinda. Part of the lesson here is only the velocity vector on ramp exit matters. At these speeds air resistance is negligible. The problem subdivides.
But the other part is that you had to measure it separated from the complex part—the actual flexible plastic ramp someone built. Forget doing it on paper, or having a 30 year accelerator ramp moratorium.
From my experience with this kind of thing, I would assume if I did a complete series of tests with the ramp and cup in a lab, verified a 100⁄100 capture rate, and measures everything to the millimeter....just moving the setup to where the demo is held and having 1 try is probably enough to have at least a 10 percent chance of a miss.
By Murphy’s law that 10 percent is more like 95 percent.
If we really expect to launch AIs with this kind of ability to misbehave we should just carve our tombstones.
So, there’s this thing that I think often happens where someone tries to do X a bunch of times, and fails, and sees a bunch of people around them try to do X and fail, and eventually learns the lesson that X basically just doesn’t work in practice. But in fact X can work in practice, it just requires a skillset which (a) is rare, and (b) the people in question did not have.
I think “get technical thing right on the first try” is an X for which this story is reasonably common.
One intended takeaway of the workshop is that, while the very large majority of people do fail (Alison’s case is roughly the median), a nonzero fraction succeed, and not just by sheer luck either. They succeed by tackling the problem in a different way.
I agree, for instance, that running a series of tests with the ramp and cup in a lab, and then just moving the setup to where the demo is held, is probably enough to have at least a 10 percent chance of a miss. Someone with a Robert-like mindset would not just rely on the lab results directly generalizing to the demo environment; they’d re-test parts of the system in the demo environment (without doing a full end-to-end test).
Also, relevant side-fact: David looked it up while we were writing this post, and we’re pretty sure the Manhattan Project’s nuke worked on their first full live test.
Twice. The Hiroshima device was untested with live cores, they must have tested everything else for the “gun” but load it with live u-335. The Trinity test was to test plutonium implosion which they were concerned about since it requires precise timing and one bad detonator will cause the implosion to not happen. So that was also their first live test.
On the other hand, the castle bravo test was one rather close to what we are afraid of for AI safety. It was meant to be 6 megatons and had 2.5 times more yield, because the rules for fusion involving lithium-7 allowed far more bang than expected. It would be analogous to retraining an AGI with a “few minor tweaks” to the underlying network architectures and getting dangerous superintelligence instead of a slight improvement on the last model.
They would have needed to recreate fusion conditions which in 1953 was mostly available only inside a detonating nuclear device. The national ignition laboratory is an example of the kind of equipment you need to research fusion if you don’t want to detonate a nuke.
The Trinity test was preceded by a full test with the Pu replaced by some other material. The inert test was designed to test whether they were getting the needed compression. (My impression is this was not publicly known until relatively recently)
I know they did many tries for the implosion mechanism. Didn’t know they did a full “dress rehearsal” where it sounds like they had every component including the casing present. Smart.
My point is there was still at least a 10 percent chance of failure even if you do all that. So many variables, just 1 dress rehearsal test is inadequate. You would almost have to have robots make several hundred complete devices, test the implosion on them all, to improve your odds. (And even today robots are incapable of building something this complex)
One out of two isn’t bad, right?
https://twitter.com/tobyordoxford/status/1659659089658388545
I likewise thought the post would consist of people trying increasingly more sophisticated approaches and always failing because of messy implementational details.
In e. g. lab experiments, you get to control the experimental setup and painstakingly optimize it to conform to whatever idealized conditions your equations are adapted for. Similar is often done in industry: we often try to screen away the messiness, either by transforming the environments our technologies are deployed in (roads for cars), or by making the technology’s performance ignore rather than adapt to the messiness (planes ignoring the ground conditions entirely). I expected the point of the exercise to be showing what it looks like when you’re exposed to reality’s raw messiness unprotected, even in an experimental setup as conceptually simple and well-understood as that.
And with “do it on the first try” on top...
But it sounds like there was a non-negligible success rate? That’s a positive surprise for me.
(Although I guess Robert’s trick is kind of “screening away the messiness”, in that he gets to ignore the ramp’s complicated mechanics and just grab the only bit of data he needs. Kinda interested what the actual success rate on this workshop was and what strategies the winners tried. @johnswentworth?)
Success rate is ~5-15%. Half of that is people who basically get lucky—the most notable such occasion was someone who did the simplest possible calculation, but dropped a factor of 2 at one point, and that just happened to work perfectly with that day’s ramp setup.
Estimating the ball’s speed from video is the main predictor of success; people who’ve done that have something like a 50% success rate (n=4 IIRC). So people do still fail using that approach—for instance, I had one group take the speed they estimated from the video, and the speed they estimated from the energy calculation, and average them together, basically as a compromise between two people within the group. Another had the general right idea but just didn’t execute very well.
Notably, the ball does consistently land in the same spot, so if one executes the right strategy well then basically-zero luck is required.
Yup, that is indeed the point.
… Which is a whole different lesson:
Kinda. Part of the lesson here is only the velocity vector on ramp exit matters. At these speeds air resistance is negligible. The problem subdivides.
But the other part is that you had to measure it separated from the complex part—the actual flexible plastic ramp someone built. Forget doing it on paper, or having a 30 year accelerator ramp moratorium.