I keep trying to explain to people that my threat model where improbably-but-possible discontinously-effective algorithmic progress in AI allowing for >1.0 self-improvement cycles at current levels of compute is a big deal. Most people who take a disagreement stance to this argue that it seems quite unlikely that this happens soon. I feel like it’s a helpful concept to be able to show them the Risk Assessment Matrix and point to the square corresponding to ‘improbable / intolerable’ and say, ‘See? improbable doesn’t mean safe!’.
I like to call accelerating change period where AI helps accelerate further AI but only via working with humans at a less-than-human-contribution level the ‘zoom’ in contrast to the super-exponential artificial-intelligence-independently-improving-itself ‘foom’. Thus, the period we are currently is the ‘zoom’ period, and the oh-shit-we’re-screwed-if-we-don’t-have-AI-alignment period is the foom period. When I talk about the future critical juncture wherein we realize could initiate a foom at then-present technology, but we restrain ourselves because we know we haven’t yet nailed AI alignment, the ‘zoom-foom gap’. This gap could be as little as seconds while a usually-overconfident capabilities engineer pauses just for a moment with their finger over the enter key, or it could be as long as a couple of years while the new model repeatedly fails the safety evaluations in its secure box despite repeated attempts to align it and thus wisely doesn’t get released. ‘Extending the zoom-foom gap’ is thus a key point of my argument for why we should build a model-architecture-agnostic secure evaluation box and then make it standard practice for all AI research groups to run this safety eval on their new models before deployment.
(note: this is more about buying ourselves time to do alignment research than directly solving the problem. Having more time during the critical period where we have a potential AGI to study but it isn’t yet loose on the world causing havok seems like a big win for alignment research. )
I keep trying to explain to people that my threat model where improbably-but-possible discontinously-effective algorithmic progress in AI allowing for >1.0 self-improvement cycles at current levels of compute is a big deal. Most people who take a disagreement stance to this argue that it seems quite unlikely that this happens soon. I feel like it’s a helpful concept to be able to show them the Risk Assessment Matrix and point to the square corresponding to ‘improbable / intolerable’ and say, ‘See? improbable doesn’t mean safe!’.
I like to call accelerating change period where AI helps accelerate further AI but only via working with humans at a less-than-human-contribution level the ‘zoom’ in contrast to the super-exponential artificial-intelligence-independently-improving-itself ‘foom’. Thus, the period we are currently is the ‘zoom’ period, and the oh-shit-we’re-screwed-if-we-don’t-have-AI-alignment period is the foom period. When I talk about the future critical juncture wherein we realize could initiate a foom at then-present technology, but we restrain ourselves because we know we haven’t yet nailed AI alignment, the ‘zoom-foom gap’. This gap could be as little as seconds while a usually-overconfident capabilities engineer pauses just for a moment with their finger over the enter key, or it could be as long as a couple of years while the new model repeatedly fails the safety evaluations in its secure box despite repeated attempts to align it and thus wisely doesn’t get released. ‘Extending the zoom-foom gap’ is thus a key point of my argument for why we should build a model-architecture-agnostic secure evaluation box and then make it standard practice for all AI research groups to run this safety eval on their new models before deployment.
(note: this is more about buying ourselves time to do alignment research than directly solving the problem. Having more time during the critical period where we have a potential AGI to study but it isn’t yet loose on the world causing havok seems like a big win for alignment research. )