Neel Nanda comments on Neel Nanda’s Shortform

Neel Nanda 12 Jul 2024 21:05 UTC
16 points
2
I don’t quite understand the question. I’ve heard various bits of gossip, both as an employee and now. I wouldn’t say I’m confident in my understanding of any of it. I was somewhat sad about Jack and Dario’s public comments about thinking it’s too early to regulate (if I understood them correctly), which I also found surprising as I thought they had fairly short timelines, but policy is not at all my area of expertise so I am not confident in this take.

I think it’s totally plausible Anthropic has net negative impact, but the same is true for almost any significant actor in a complex situation. I agree that policy is one such way that their impact could be negative, though I’d generally bet Anthropic will push more for policies I personally support than any other lab, even if they may not push as much as I want them to.
- Akash 12 Jul 2024 21:40 UTC
  38 points
  35
  Parent
  I’m a bit worried about a dynamic where smart technical folks end up feeling like “well, I’m kind of disappointed in Anthropic’s comms/policy stuff from what I hear, and I do wish they’d be more transparent, but policy is complicated and I’m not really a policy expert”.
  To be clear, this is a quite reasonable position for any given technical researcher to have– the problem is that this provides pretty little accountability. In a world where Anthropic was (hypothetically) dishonest, misleading, actively trying to undermine/weaken regulations, or putting its own interests above the interests of the “commons”, it seems to me like many technical researchers (even Anthropic staff) would not be aware of this. Or they might get some negative vibes but then slip back into a “well, I’m not a policy person, and policy is complicated” mentality.
  I’m not saying there’s even necessarily a strong case that Anthropic is trying to sabotage policy efforts (though I am somewhat concerned about some of the rhetoric Anthropic uses, public comments about thinking its too early to regulate, rumors that they have taken actions to oppose SB 1047, and a lack of any real “positive” signals from their positive team like EG recommending or developing policy proposals that go beyond voluntary commitments or encouraging people to measure risks.)
  But I think once upon a time there was some story that if Anthropic defected in major ways, a lot of technical researchers would get concerned and quit/whistleblow. I think Anthropic’s current comms strategy, combined with the secrecy around a lot of policy things, combined with a general attitude (whether justified or unjustified) of “policy is complicated and I’m a technical person so I’m just going to defer to Dario/Jack” makes me concerned that safety-concerned people won’t be able to hold Anthropic accountable even if it actively sabotages policy stuff.
  I’m also not really sure if there’s an easy solution to this problem, but I do imagine part of the solution involves technical people (especially at Anthropic) raising questions, asking people like Jack and Dario to explain their takes more, and being more willing to raise public & private discussions about Anthropic’s role in the broader policy space.
- simeon_c 13 Jul 2024 15:12 UTC
  16 points
  13
  Parent
  Thanks for answering, that’s very useful.
  My concern is that as far as I understand, a decent number of safety researchers are thinking that policy is the most important area, but because, as you mentioned, they aren’t policy experts and don’t really know what’s going on, they just assume that Anthropic policy work is way better than those actually working in policy judge it to be. I’ve heard from a surprisingly high number of people among the orgs that are doing the best AI policy work that Anthropic policy is mostly anti-helpful.
  Somehow though, internal employees keep deferring to their policy team and don’t update on that part/take their beliefs seriously.
  I’d generally bet Anthropic will push more for policies I personally support than any other lab, even if they may not push as much as I want them to.
  If it’s true, it is probably true to an epsilon degree, and it might be wrong because of weird preferences of a non-safety industry actor. AFAIK, Anthropic has been pushing against all the AI regulation proposals to date. I’ve still to hear a positive example.
- Akash 12 Jul 2024 21:45 UTC
  10 points
  1
  Parent
  Separately, while I think the discussion around “is X net negative” can be useful, I think it ends up implicitly putting the frame on “can X justify that they are not net negative.”
  I suspect the quality of discourse– and society’s chances to have positive futures– would improve if the frame were more commonly something like “what are the best actions for X to be taken” or “what are reasonable/high-value things that X could be doing.”
  And I think it’s valid to think “X is net positive” while also thinking “I feel disappointed in X because I don’t think it’s using its power/resources in ways that would produce significantly better outcomes.”
  IDK what the bar should be for considering X a “responsible actor”, but I imagine my personal bar is quite a bit higher than “(barely) net positive in expectation.”
  P.S. Both of these comments are on the opinionated side, so separately, I just wanted to say thank you Neel for speaking up & for offering your current takes on Anthropic. Strong upvoted!