There is the contemporary alignment of large language models
Then there is the study that concerns itself more generally with designing artificial intelligence systems that reliably and robustly pursue the intended goals and values of their human operators, while avoiding unintended negative consequences.
The former is only a sub-field of the latter.
The Prosaic AI Assumptionwhich is that we’ll be able to produce an AGI without any further theoretical breakthroughs. This seems very likely to be correct. People seem to be making a 2nd bonus assumption, the Prosaic Experimental Assumption is that findings from experiments on contemporary models will be applicable to AGI systems, including those posing existential risks.
This assumption is particularly dangerous because individuals on Lesswrong seem to make it all the time without stating so. I think it’s a serious mistake to assume that there will not be difficult to predict, emergent properties once we have deployed an AGI.
Note that this isn’t against experimental research in general, as long as we are careful about what we extrapolate from that evidence.
The word alignment carries two meanings
They’re often used interchangeably.
There is the contemporary alignment of large language models
Then there is the study that concerns itself more generally with designing artificial intelligence systems that reliably and robustly pursue the intended goals and values of their human operators, while avoiding unintended negative consequences.
The former is only a sub-field of the latter.
The Prosaic AI Assumption which is that we’ll be able to produce an AGI without any further theoretical breakthroughs. This seems very likely to be correct. People seem to be making a 2nd bonus assumption, the Prosaic Experimental Assumption is that findings from experiments on contemporary models will be applicable to AGI systems, including those posing existential risks.
This assumption is particularly dangerous because individuals on Lesswrong seem to make it all the time without stating so. I think it’s a serious mistake to assume that there will not be difficult to predict, emergent properties once we have deployed an AGI.
Note that this isn’t against experimental research in general, as long as we are careful about what we extrapolate from that evidence.
“You are not measuring what you think you’re measuring”