I think almost all of the acceleration comes from either products that generate $ and hype and further investment, or more directly from scaleup to more powerful models. I think “We have powerful AI systems but haven’t deployed them to do stuff they are capable of” is a very short-term kind of situation and not particularly desirable besides.
I’m not sure what you are comparing RLHF or WebGPT to when you say “paradigm of AIs that are much harder to align.” I think I probably just think this is wrong, in that (i) you are comparing to pure generative modeling but I think that’s the wrong comparison point barring a degree of coordination that is much larger than what is needed to avoid scaling up models past dangerous thresholds, (ii) I think you are wrong about the dynamics of deceptive alignment under existing mitigation strategies and that scaling up generative modeling to the point where it is transformative is considerably more likely to lead to deceptive alignment than using RLHF (primarily via involving much more intelligent models).
I think almost all of the acceleration comes from either products that generate $ and hype and further investment, or more directly from scaleup to more powerful models. I think “We have powerful AI systems but haven’t deployed them to do stuff they are capable of” is a very short-term kind of situation and not particularly desirable besides.
I’m not sure what you are comparing RLHF or WebGPT to when you say “paradigm of AIs that are much harder to align.” I think I probably just think this is wrong, in that (i) you are comparing to pure generative modeling but I think that’s the wrong comparison point barring a degree of coordination that is much larger than what is needed to avoid scaling up models past dangerous thresholds, (ii) I think you are wrong about the dynamics of deceptive alignment under existing mitigation strategies and that scaling up generative modeling to the point where it is transformative is considerably more likely to lead to deceptive alignment than using RLHF (primarily via involving much more intelligent models).