Joseph Miller comments on evhub’s Shortform

Joseph Miller Dec 28, 2024, 1:53 PM
79 points
48
The ideal version of Anthropic would
1. Make substantial progress on technical AI safety
2. Use its voice to make people take AI risk more seriously
3. Support AI safety regulation
4. Not substantially accelerate the AI arms race
In practice I think Anthropic has
1. Made a little progress on technical AI safety
2. Used its voice to make people take AI risk less seriously^[1]
3. Obstructed AI safety regulation
4. Substantially accelerated the AI arms race
What I would do differently.
1. Do better alignment research, idk this is hard.
2. Communicate in a manner that is consistent with the apparent belief of Anthropic leadership that alignment may be hard and x-risk is >10% probable. Their communications strongly signal “this is a Serious Issue, like climate change, and we will talk lots about it and make gestures towards fixing the problem but none of us are actually worried about it, and you shouldn’t be either. When we have to make a hard trade-off between safety and the bottom line, we will follow the money every time.”
3. Lobby politicians to regulate AI. When a good regulation like SB-1047 is proposed, support it.
4. Don’t push the frontier of capabilities. Obviously this is basically saying that Anthropic should stop making money and therefore stop existing. The more nuanced version is that for Anthropic to justify its existence, each time it pushes the frontier of capabilities should be earned by substantial progress on the other three points.
1. ^
  My understanding is that a significant aim of your recent research is to test models’ alignment so that people will take AI risk more seriously when things start to heat up. This seems good but I expect the net effect of Anthropic is still to make people take alignment less seriously due to the public communications of the company.
What links here?
- AI #97: 4 by Zvi (Jan 2, 2025, 2:10 PM; 45 points)
- MichaelDickens Dec 28, 2024, 7:50 PM
  11 points
  −2
  Parent
  Don’t push the frontier of regulations. Obviously this is basically saying that Anthropic should stop making money and therefore stop existing. The more nuanced version is that for Anthropic to justify its existence, each time it pushes the frontier of capabilities should be earned by substantial progress on the other three points.
  
  I think I have a stronger position on this than you do. I don’t think Anthropic should push the frontier of capabilities, even given the tradeoff it faces.
  
  If their argument is “we know arms races are bad, but we have to accelerate arms races or else we can’t do alignment research,” they should be really really sure that they do, actually, have to do the bad thing to get the good thing. But I don’t think you can be that sure and I think the claim is actually less than 50% likely to be true.
  1. I don’t take it for granted that Anthropic wouldn’t exist if it didn’t push the frontier. It could operate by intentionally lagging a bit behind other AI companies while still staying roughly competitive, and/or it could compete by investing harder in good UX. I suspect a (say) 25% worse model is not going to be much less profitable.
  2. (This is a weaker argument but) If it does turn out that Anthropic really can’t exist without pushing the frontier and it has to close down, that’s probably a good thing. At the current level of investment in AI alignment research, I believe reducing arms race dynamics + reducing alignment research probably net decreases x-risk, and it would be better for this version of Anthropic not to exist. People at Anthropic probably disagree, but they should be very concerned that they have a strong personal incentive to disagree, and should be wary of their own bias. And they should be especially especially wary given that they hold the fate of humanity in their hands.
  - JustinShovelain Jan 8, 2025, 11:34 AM
    4 points
    −6
    Parent
    I agree.
    Anthropic’s marginal contribution to safety (compared to what we would have in a world without Anthropic) probably doesn’t offset Anthropic’s contribution to the AI race.
    I think there are more worlds where Anthropic is contributing to the race in a negative fashion than there are worlds where Anthropic’s marginal safety improvement over OpenAI/DeepMind-ish orgs is critical for securing a good future with AGI (weighing things according to the impact sizes and probabilities).
- BrianTan Dec 28, 2024, 2:11 PM
  8 points
  6
  Parent
  My typo reaction may have glitched, but I think you meant “Don’t push the frontier of capabilities” in the last bullet?