I think there will be a period in the future where AI systems (models and their scaffolding) exist which are sufficiently capable that they will be able to speed up many aspects of computer-based R&D. Including recursive-self-improvement, Alignment research and Control research. Obviously, such a time period will not be likely to last long given that surely some greedy actor will pursue RSI. So personally, that’s why I’m not putting a lot of faith in getting to that period [edit: resulting in safety].
I think that if you build the scaffolding which would make current models able to be substantially helpful at research (which would be impressively strong scaffolding indeed!), then you have built dual-use scaffolding which could also be useful for RSI. So any plans to do this must take appropriate security measures or they will be net harmful.
I think there will be a period in the future where AI systems (models and their scaffolding) exist which are sufficiently capable that they will be able to speed up many aspects of computer-based R&D. Including recursive-self-improvement, Alignment research and Control research. Obviously, such a time period will not be likely to last long given that surely some greedy actor will pursue RSI. So personally, that’s why I’m not putting a lot of faith in getting to that period [edit: resulting in safety].
I think that if you build the scaffolding which would make current models able to be substantially helpful at research (which would be impressively strong scaffolding indeed!), then you have built dual-use scaffolding which could also be useful for RSI. So any plans to do this must take appropriate security measures or they will be net harmful.
Agree with a lot of this, but scaffolds still seem to me pretty good, for reasons largely similar to those in https://www.lesswrong.com/posts/fRSj2W4Fjje8rQWm9/thoughts-on-sharing-information-about-language-model#Accelerating_LM_agents_seems_neutral__or_maybe_positive_.