It’s decently likely that it will be pretty easy to get GPT-7 to avoid breaking the law or other egregious issue. As systems get more capable, basic alignment approaches get better at preventing stuff we can measure well. It’s plausible that scalable approaches will be needed to avoid egregious and obvious failures which aren’t takeover, but it currently seems unlikely. (see also list of lethalities and here and : “The difference is that reality doesn’t force us to solve the problem, or tell us clearly which analogies are the right ones, and so it’s possible for us to push ahead and build AGI without solving alignment.”). I think it’s possible reality force you to solve the scalable problem directly because otherwise you’ll see egregious failures, but more like 25% probability.
I agree that it’s plausible that scalable/robust alignment will be the binding constraint on deploying AGI (if there isn’t otherwise a robust solution by them), and generally becomes more likely as AGI becomes more salient. But the chance seems more like 50⁄50 to me than a sure bet. Consider gain of function research and lab leaks. These issues become somewhat more salient during covid, but there’s still a ton of extremely negative EV gain of function research (IMO)
I roughly agree with Akash’s comment.
But also some additional points:
It’s decently likely that it will be pretty easy to get GPT-7 to avoid breaking the law or other egregious issue. As systems get more capable, basic alignment approaches get better at preventing stuff we can measure well. It’s plausible that scalable approaches will be needed to avoid egregious and obvious failures which aren’t takeover, but it currently seems unlikely. (see also list of lethalities and here and : “The difference is that reality doesn’t force us to solve the problem, or tell us clearly which analogies are the right ones, and so it’s possible for us to push ahead and build AGI without solving alignment.”). I think it’s possible reality force you to solve the scalable problem directly because otherwise you’ll see egregious failures, but more like 25% probability.
I agree that it’s plausible that scalable/robust alignment will be the binding constraint on deploying AGI (if there isn’t otherwise a robust solution by them), and generally becomes more likely as AGI becomes more salient. But the chance seems more like 50⁄50 to me than a sure bet. Consider gain of function research and lab leaks. These issues become somewhat more salient during covid, but there’s still a ton of extremely negative EV gain of function research (IMO)