The above gif comes from the brilliant childrens claymation film, “Wallace and Gromit The Wrong Trousers”. In this scene, Gromit the dog rapidly lays down track to prevent a toy train from crashing. I will argue that this is an apt analogy for the alignment situation we will find ourselves in the future and that prosaic alignment is focused only on the first track.
The last few years have seen a move from “big brain” alignment research directions to prosaic approaches. In other words asking how to align near-contemporary models instead of asking high level questions about aligning general AGI systems.
This makes a lot of sense as a strategy. One, we can actually get experimental verification for theories. And two, we seem to be in the predawn of truly general intelligence, and it would be crazy not to be shifting our focus towards the specific systems that seem likely to cause an existential threat. Urgency compels us to focus on prosaic alignment. To paraphrase a (now deleted) tweet from a famous researcher “People arguing that we shouldn’t focus on contemporary systems are like people wanting to research how flammable the roof is whilst standing in a burning kitchen”*
What I believe this idea is neglecting is that the first systems to emerge will be immediately used to produce the second generation. AI assisted programming has exploded in popularity, and while Superalignment is being lauded as a safety push, you can view it as a commitment from OpenAI to produce and deploy automated researchers in the next few years. If we do not have a general theory of alignment, we will be left in the dust.
To bring us back to the above analogy. Prosaic alignment is rightly focused on laying down the first train track of alignment, but we also need to be prepared for laying down successive tracks as alignment kicks off. If we don’t have a general theory of alignment we may “paint ourselves into corners” by developing a first generation of models which do not provide a solid basis for building future aligned models.
What exactly these hurdles are, I don’t know. But let us hope there continues to be high level, esoteric research that means we can safely discover and navigate these murky waters.
*Because the tweet is appears to be deleted, I haven’t attributed it to the original author. My paraphrase may be slightly off.
Train Tracks
The above gif comes from the brilliant childrens claymation film, “Wallace and Gromit The Wrong Trousers”. In this scene, Gromit the dog rapidly lays down track to prevent a toy train from crashing. I will argue that this is an apt analogy for the alignment situation we will find ourselves in the future and that prosaic alignment is focused only on the first track.
The last few years have seen a move from “big brain” alignment research directions to prosaic approaches. In other words asking how to align near-contemporary models instead of asking high level questions about aligning general AGI systems.
This makes a lot of sense as a strategy. One, we can actually get experimental verification for theories. And two, we seem to be in the predawn of truly general intelligence, and it would be crazy not to be shifting our focus towards the specific systems that seem likely to cause an existential threat. Urgency compels us to focus on prosaic alignment. To paraphrase a (now deleted) tweet from a famous researcher “People arguing that we shouldn’t focus on contemporary systems are like people wanting to research how flammable the roof is whilst standing in a burning kitchen”*
What I believe this idea is neglecting is that the first systems to emerge will be immediately used to produce the second generation. AI assisted programming has exploded in popularity, and while Superalignment is being lauded as a safety push, you can view it as a commitment from OpenAI to produce and deploy automated researchers in the next few years. If we do not have a general theory of alignment, we will be left in the dust.
To bring us back to the above analogy. Prosaic alignment is rightly focused on laying down the first train track of alignment, but we also need to be prepared for laying down successive tracks as alignment kicks off. If we don’t have a general theory of alignment we may “paint ourselves into corners” by developing a first generation of models which do not provide a solid basis for building future aligned models.
What exactly these hurdles are, I don’t know. But let us hope there continues to be high level, esoteric research that means we can safely discover and navigate these murky waters.
*Because the tweet is appears to be deleted, I haven’t attributed it to the original author. My paraphrase may be slightly off.