Don’t push the frontier of regulations. Obviously this is basically saying that Anthropic should stop making money and therefore stop existing. The more nuanced version is that for Anthropic to justify its existence, each time it pushes the frontier of capabilities should be earned by substantial progress on the other three points.
I think I have a stronger position on this than you do. I don’t think Anthropic should push the frontier of capabilities, even given the tradeoff it faces.
If their argument is “we know arms races are bad, but we have to accelerate arms races or else we can’t do alignment research,” they should be really really sure that they do, actually, have to do the bad thing to get the good thing. But I don’t think you can be that sure and I think the claim is actually less than 50% likely to be true.
I don’t take it for granted that Anthropic wouldn’t exist if it didn’t push the frontier. It could operate by intentionally lagging a bit behind other AI companies while still staying roughly competitive, and/or it could compete by investing harder in good UX. I suspect a (say) 25% worse model is not going to be much less profitable.
(This is a weaker argument but) If it does turn out that Anthropic really can’t exist without pushing the frontier and it has to close down, that’s probably a good thing. At the current level of investment in AI alignment research, I believe reducing arms race dynamics + reducing alignment research probably net decreases x-risk, and it would be better for this version of Anthropic not to exist. People at Anthropic probably disagree, but they should be very concerned that they have a strong personal incentive to disagree, and should be wary of their own bias. And they should be especially especially wary given that they hold the fate of humanity in their hands.
I think I have a stronger position on this than you do. I don’t think Anthropic should push the frontier of capabilities, even given the tradeoff it faces.
If their argument is “we know arms races are bad, but we have to accelerate arms races or else we can’t do alignment research,” they should be really really sure that they do, actually, have to do the bad thing to get the good thing. But I don’t think you can be that sure and I think the claim is actually less than 50% likely to be true.
I don’t take it for granted that Anthropic wouldn’t exist if it didn’t push the frontier. It could operate by intentionally lagging a bit behind other AI companies while still staying roughly competitive, and/or it could compete by investing harder in good UX. I suspect a (say) 25% worse model is not going to be much less profitable.
(This is a weaker argument but) If it does turn out that Anthropic really can’t exist without pushing the frontier and it has to close down, that’s probably a good thing. At the current level of investment in AI alignment research, I believe reducing arms race dynamics + reducing alignment research probably net decreases x-risk, and it would be better for this version of Anthropic not to exist. People at Anthropic probably disagree, but they should be very concerned that they have a strong personal incentive to disagree, and should be wary of their own bias. And they should be especially especially wary given that they hold the fate of humanity in their hands.