I also agree that direct jumps in capability due to research insight are rare. But in part I think that’s just because things get tried at small scale first, and so there’s always going to be some scaling-up period where the new insight gets fed more and more resources, eventually outpacing the old state of the art. From a coarse-grained perspective GPT-2 relative to your favorite LSTM model from 2018 is the “jump in capability” due to research insight, it just got there in a not-so-discontinuous way.
Seems right to me.
if some company is partway through scaling up the hot new algorithm and (rather than training to completion) they trip the alarm that was searching for undesirable real-world behavior because of learned agent-like reasoning, what then?
(I’m not convinced this is a good tripwire, but under the assumption that it is:)
Ideally they have already applied safety solutions and so this doesn’t even happen in the first place. But supposing this did happen, they turn off the AI system because they remember how Amabook lost a billion dollars through their AI system embezzling money from them, and they start looking into how to fix this issue.
Seems right to me.
(I’m not convinced this is a good tripwire, but under the assumption that it is:)
Ideally they have already applied safety solutions and so this doesn’t even happen in the first place. But supposing this did happen, they turn off the AI system because they remember how Amabook lost a billion dollars through their AI system embezzling money from them, and they start looking into how to fix this issue.