Commenting on the general case, rather than GPT-7 in particular: my background view on this kind of thing is that there are many different ways of reaching AGI in principle, and the vast majority of paths to AGI don’ t result in early-generation AGI systems being alignable in a reasonable amount of time. (Or they’re too slow/limited/etc. and end up being irrelevant.)
The most likely (and also the most conservative) view is that (efficient, effective) alignability is a rare feature—not necessarily a hard-to-achieve feature if you have a broad-strokes idea of what you’re looking for and you spent the years leading up to AGI deliberately steering toward the alignable subspace of AI approaches, but still not one that you get for free.
I think your original Q is a good prompt to think about and discuss, but if we’re meant to assume alignability, I want to emphasize that this is the kind of assumption that should probably always get explicitly flagged. Otherwise, for most approaches to AGI that weren’t strongly filtered for alignability, answering ‘How would you reduce risk (without destroying it)?’ in real life will probably mostly be about convincing the project to never deploy, finding ways to redirect resources to other approaches, and reaching total confidence that the underlying ideas and code won’t end up stolen or posted to arXiv.
Yes, I know your position from your previous comments on the topic, but it seems that GPT-like systems are winning median term and we can’t stops this. Even if they can’t be scaled to superintelligence, they may need some safety features.
Commenting on the general case, rather than GPT-7 in particular: my background view on this kind of thing is that there are many different ways of reaching AGI in principle, and the vast majority of paths to AGI don’ t result in early-generation AGI systems being alignable in a reasonable amount of time. (Or they’re too slow/limited/etc. and end up being irrelevant.)
The most likely (and also the most conservative) view is that (efficient, effective) alignability is a rare feature—not necessarily a hard-to-achieve feature if you have a broad-strokes idea of what you’re looking for and you spent the years leading up to AGI deliberately steering toward the alignable subspace of AI approaches, but still not one that you get for free.
I think your original Q is a good prompt to think about and discuss, but if we’re meant to assume alignability, I want to emphasize that this is the kind of assumption that should probably always get explicitly flagged. Otherwise, for most approaches to AGI that weren’t strongly filtered for alignability, answering ‘How would you reduce risk (without destroying it)?’ in real life will probably mostly be about convincing the project to never deploy, finding ways to redirect resources to other approaches, and reaching total confidence that the underlying ideas and code won’t end up stolen or posted to arXiv.
Yes, I know your position from your previous comments on the topic, but it seems that GPT-like systems are winning median term and we can’t stops this. Even if they can’t be scaled to superintelligence, they may need some safety features.