So, there’s this thing that I think often happens where someone tries to do X a bunch of times, and fails, and sees a bunch of people around them try to do X and fail, and eventually learns the lesson that X basically just doesn’t work in practice. But in fact X can work in practice, it just requires a skillset which (a) is rare, and (b) the people in question did not have.
I think “get technical thing right on the first try” is an X for which this story is reasonably common.
One intended takeaway of the workshop is that, while the very large majority of people do fail (Alison’s case is roughly the median), a nonzero fraction succeed, and not just by sheer luck either. They succeed by tackling the problem in a different way.
I agree, for instance, that running a series of tests with the ramp and cup in a lab, and then just moving the setup to where the demo is held, is probably enough to have at least a 10 percent chance of a miss. Someone with a Robert-like mindset would not just rely on the lab results directly generalizing to the demo environment; they’d re-test parts of the system in the demo environment (without doing a full end-to-end test).
Also, relevant side-fact: David looked it up while we were writing this post, and we’re pretty sure the Manhattan Project’s nuke worked on their first full live test.
Twice. The Hiroshima device was untested with live cores, they must have tested everything else for the “gun” but load it with live u-335. The Trinity test was to test plutonium implosion which they were concerned about since it requires precise timing and one bad detonator will cause the implosion to not happen. So that was also their first live test.
On the other hand, the castle bravo test was one rather close to what we are afraid of for AI safety. It was meant to be 6 megatons and had 2.5 times more yield, because the rules for fusion involving lithium-7 allowed far more bang than expected. It would be analogous to retraining an AGI with a “few minor tweaks” to the underlying network architectures and getting dangerous superintelligence instead of a slight improvement on the last model.
They would have needed to recreate fusion conditions which in 1953 was mostly available only inside a detonating nuclear device. The national ignition laboratory is an example of the kind of equipment you need to research fusion if you don’t want to detonate a nuke.
The Trinity test was preceded by a full test with the Pu replaced by some other material. The inert test was designed to test whether they were getting the needed compression. (My impression is this was not publicly known until relatively recently)
I know they did many tries for the implosion mechanism. Didn’t know they did a full “dress rehearsal” where it sounds like they had every component including the casing present. Smart.
My point is there was still at least a 10 percent chance of failure even if you do all that. So many variables, just 1 dress rehearsal test is inadequate. You would almost have to have robots make several hundred complete devices, test the implosion on them all, to improve your odds. (And even today robots are incapable of building something this complex)
The comparison between the calculations saying igniting the atmosphere was impossible and the catastrophic mistake on Castle Bravo is apposite as the initial calculations for both were done by the same people at the same gathering!
So, there’s this thing that I think often happens where someone tries to do X a bunch of times, and fails, and sees a bunch of people around them try to do X and fail, and eventually learns the lesson that X basically just doesn’t work in practice. But in fact X can work in practice, it just requires a skillset which (a) is rare, and (b) the people in question did not have.
I think “get technical thing right on the first try” is an X for which this story is reasonably common.
One intended takeaway of the workshop is that, while the very large majority of people do fail (Alison’s case is roughly the median), a nonzero fraction succeed, and not just by sheer luck either. They succeed by tackling the problem in a different way.
I agree, for instance, that running a series of tests with the ramp and cup in a lab, and then just moving the setup to where the demo is held, is probably enough to have at least a 10 percent chance of a miss. Someone with a Robert-like mindset would not just rely on the lab results directly generalizing to the demo environment; they’d re-test parts of the system in the demo environment (without doing a full end-to-end test).
Also, relevant side-fact: David looked it up while we were writing this post, and we’re pretty sure the Manhattan Project’s nuke worked on their first full live test.
Twice. The Hiroshima device was untested with live cores, they must have tested everything else for the “gun” but load it with live u-335. The Trinity test was to test plutonium implosion which they were concerned about since it requires precise timing and one bad detonator will cause the implosion to not happen. So that was also their first live test.
On the other hand, the castle bravo test was one rather close to what we are afraid of for AI safety. It was meant to be 6 megatons and had 2.5 times more yield, because the rules for fusion involving lithium-7 allowed far more bang than expected. It would be analogous to retraining an AGI with a “few minor tweaks” to the underlying network architectures and getting dangerous superintelligence instead of a slight improvement on the last model.
They would have needed to recreate fusion conditions which in 1953 was mostly available only inside a detonating nuclear device. The national ignition laboratory is an example of the kind of equipment you need to research fusion if you don’t want to detonate a nuke.
The Trinity test was preceded by a full test with the Pu replaced by some other material. The inert test was designed to test whether they were getting the needed compression. (My impression is this was not publicly known until relatively recently)
I know they did many tries for the implosion mechanism. Didn’t know they did a full “dress rehearsal” where it sounds like they had every component including the casing present. Smart.
My point is there was still at least a 10 percent chance of failure even if you do all that. So many variables, just 1 dress rehearsal test is inadequate. You would almost have to have robots make several hundred complete devices, test the implosion on them all, to improve your odds. (And even today robots are incapable of building something this complex)
One out of two isn’t bad, right?
https://twitter.com/tobyordoxford/status/1659659089658388545