This is incredible—the most hopeful I’ve been in a long time. 20% of current compute and plans for a properly-sized team! I’m not aware of any clearer or more substantive sign from a major lab that they actually care, and are actually going to try not to kill everyone. I hope that DeepMind and Anthropic have great things planned to leapfrog this!
I hope that DeepMind and Anthropic have great things planned to leapfrog this!
I don’t get your model of the world that would imply the notion of DM/Anthropic “leapfrogging” as a sensible frame. There should be no notion of competition between these labs when it comes to “superalignment”. If there is, that is weak evidence of our entire lightcone being doomed.
What does competition mean to you? To me it would involve, for example, not sharing your alignment techniques with the other groups in order to “retain an edge”, or disingenuously resisting admitting that other groups’ techniques or approaches might be better than yours. This is how competitive players behave. If you’re not doing this then it’s not competition.
I see what you mean. I was thinking “labs try their hardest to demonstrate that they are working to align superintelligent AI, because they’ll look less responsible than their competitors if they don’t.”
I don’t think keeping “superalignment” techniques secret would generally make sense right now, since it’s in everyone’s best interests that the first superintelligence isn’t misaligned (I’m not really thinking about “alignment” that also improved present-day capabilities, like RLHF).
As for your second point, I think that for an AI lab that wants to improve PR, the important thing is showing “we’re helping the alignment community by investing significant resources into solving this problem,” not “our techniques are better than our competitors’.” The dynamic you’re talking about might have some negative effect, but I personally think the positive effects of competition would vastly outweigh it (even though many alignment-focused commitments from AI labs will probably turn out to be not-very-helpful signaling).
When you talk of competing to look more responsible, or to improve PR, for which audience would you say they’re performing?
I know, personally, I’ll work a lot harder and for less pay and be more likely to apply in the first place to places that seem more responsible, but I’m not sure how strong that effect is.
This is incredible—the most hopeful I’ve been in a long time. 20% of current compute and plans for a properly-sized team! I’m not aware of any clearer or more substantive sign from a major lab that they actually care, and are actually going to try not to kill everyone. I hope that DeepMind and Anthropic have great things planned to leapfrog this!
I don’t get your model of the world that would imply the notion of DM/Anthropic “leapfrogging” as a sensible frame. There should be no notion of competition between these labs when it comes to “superalignment”. If there is, that is weak evidence of our entire lightcone being doomed.
Competition between labs on capabilities is bad; competition between labs on alignment would be fantastic.
What does competition mean to you? To me it would involve, for example, not sharing your alignment techniques with the other groups in order to “retain an edge”, or disingenuously resisting admitting that other groups’ techniques or approaches might be better than yours. This is how competitive players behave. If you’re not doing this then it’s not competition.
I see what you mean. I was thinking “labs try their hardest to demonstrate that they are working to align superintelligent AI, because they’ll look less responsible than their competitors if they don’t.”
I don’t think keeping “superalignment” techniques secret would generally make sense right now, since it’s in everyone’s best interests that the first superintelligence isn’t misaligned (I’m not really thinking about “alignment” that also improved present-day capabilities, like RLHF).
As for your second point, I think that for an AI lab that wants to improve PR, the important thing is showing “we’re helping the alignment community by investing significant resources into solving this problem,” not “our techniques are better than our competitors’.” The dynamic you’re talking about might have some negative effect, but I personally think the positive effects of competition would vastly outweigh it (even though many alignment-focused commitments from AI labs will probably turn out to be not-very-helpful signaling).
When you talk of competing to look more responsible, or to improve PR, for which audience would you say they’re performing?
I know, personally, I’ll work a lot harder and for less pay and be more likely to apply in the first place to places that seem more responsible, but I’m not sure how strong that effect is.
Yeah, I’m mostly thinking about potential hires.