Some added context for this list: Nate and Eliezer expect the first AGI developers to encounter many difficulties in the “something forces you to stop and redesign (and/or recode, and/or retrain) large parts of the system” category, with the result that alignment adds significant development time.
By default, safety-conscious groups won’t be able to stabilize the game board before less safety-conscious groups race ahead and destroy the world. To avoid this outcome, humanity needs there to exist an AGI group that…
is highly safety-conscious.
has a large resource advantage over the other groups, so that it can hope to reach AGI with more than a year of lead time — including accumulated capabilities ideas and approaches that it hasn’t been publishing.
has adequate closure and opsec practices, so that it doesn’t immediately lose its technical lead if it successfully acquires one.
The magnitude and variety of difficulties that are likely to arise in aligning the first AGI systems also suggests that failure is very likely in trying to align systems as opaque as current SotA systems; and suggests an AGI developer likely needs to have spent preceding years deliberately steering toward approaches to AGI that are relatively alignable; and it suggests that we need to up our game in general, approaching the problem in ways that are closer to the engineering norms at (for example) NASA, than to the engineering norms that are standard in ML today.
Some added context for this list: Nate and Eliezer expect the first AGI developers to encounter many difficulties in the “something forces you to stop and redesign (and/or recode, and/or retrain) large parts of the system” category, with the result that alignment adds significant development time.
By default, safety-conscious groups won’t be able to stabilize the game board before less safety-conscious groups race ahead and destroy the world. To avoid this outcome, humanity needs there to exist an AGI group that…
is highly safety-conscious.
has a large resource advantage over the other groups, so that it can hope to reach AGI with more than a year of lead time — including accumulated capabilities ideas and approaches that it hasn’t been publishing.
has adequate closure and opsec practices, so that it doesn’t immediately lose its technical lead if it successfully acquires one.
The magnitude and variety of difficulties that are likely to arise in aligning the first AGI systems also suggests that failure is very likely in trying to align systems as opaque as current SotA systems; and suggests an AGI developer likely needs to have spent preceding years deliberately steering toward approaches to AGI that are relatively alignable; and it suggests that we need to up our game in general, approaching the problem in ways that are closer to the engineering norms at (for example) NASA, than to the engineering norms that are standard in ML today.