Neel Nanda’s Shortform

Neel Nanda12 Jul 2024 7:16 UTC

LW: 8 AF: 6

7 comments1 min readLW link

World Modeling

Neel Nanda 12 Jul 2024 7:16 UTC
296 points
31
In response to Habryka’s shortform, I can confirm that I signed a concealed non-disparagement as part of my Anthropic separation agreement. I worked there for 6 months and left in mid 2022. I received a cash payment as part of that agreement, with nothing shady going on a la threatening previous compensation (though I had no equity to threaten). In hindsight I undervalued my ability to speak freely, and didn’t more seriously consider that I could just decline to sign the separation agreement and walk away, I’m not sure what I would do if doing it again.

I asked Anthropic to release me from this after the comment thread started, and they have now released me from both the non-disparagement clause, and the non-disclosure part, which was very nice of them—I would encourage anyone in a similar situation to reach out to hr[at]anthropic.com and legal[at]anthropic.com, though obviously can’t guarantee that they’ll release everyone. Feel free to DM or email for advice if you’re in a similar situation.

I’ll take advantage of my newfound freedoms to say that...

Idk, I don’t really have anything too disparaging to say (though I dislike the use of concealed non-disparagements in general and am glad they say they’re stopping!). I’m broadly a fan of Anthropic, think their heart is likely in the right place and they’re trying to do what’s best for the world (though could easily be making the wrong calls) and would seriously consider returning in the right circumstances. I’ve recommended that several friends of mine accept offers to do safety and interp work there, and feel good about this (though would feel much more hesitant about recommending someone joins a pure capabilities team there). My biggest critique is that I have concerns about their willingness to push the capabilities frontier and worsen race dynamics and, while I can imagine reasonable justifications, I think they’re under valuing the importance of at least having clear public positions and rationales for this kind of thing and their clear shift in policies since Claude 1.0

EDIT: An additional detail that I genuinely appreciate is that Anthropic paid for me to have an independent lawyer to help explain the separation agreement and negotiate some changes on my behalf (I didn’t push back on the concealed non-disparagement, but did alter some other parts). They recommended an independent lawyer, who I used, but were also happy to pay for a lawyer of my choice. As far as I’m aware, this was quite a non-standard thing for a company to do, and I appreciate it and think this was good and ethical in a way that wasn’t obligatory.

EDIT 2: Someone asked that I share the terms of the agreement.

The non-disparagement clause:

Without prejudice to clause 6.3 [referring to my farewell letter to Anthropic staff, which I don’t think was disparaging or untrue, but to be safe], each party agrees that it will not make or publish or cause to be made or published any disparaging or untrue remark about the other party or, as the case may be, its directors, officers or employees. However, nothing in this clause or agreement will prevent any party to this agreement from (i) making a protected disclosure pursuant to Part IVA of the Employment Rights Act 1996 and/or (ii) reporting a criminal offence to any law enforcement agency and/or a regulatory breach to a regulatory authority and/or participating in any investigation or proceedings in either respect.

The non-disclosure clause:

Without prejudice to clause 6.3 [referring to my farewell letter to Anthropic staff] and 7 [about what kind of references Anthropic could provide for me], both Parties agree to keep the terms and existence of this agreement and the circumstances leading up to the termination of the Consultant’s engagement and the completion of this agreement confidential save as [a bunch of legal boilerplate, and two bounded exceptions I asked for but would rather not publicly share. I don’t think these change anything, but feel free to DM if you want to know]
What links here?
- AI #73: Openly Evil AI by Zvi (18 Jul 2024 14:40 UTC; 89 points)
- simeon_c 12 Jul 2024 19:29 UTC
  11 points
  14
  Parent
  How aware were you (as an employee) & are you (now) of their policy work? In a world model where policy is the most important stuff, it seems to me like it could tarnish very negatively Anthropic’s net impact.
  - Neel Nanda 12 Jul 2024 21:05 UTC
    16 points
    2
    Parent
    I don’t quite understand the question. I’ve heard various bits of gossip, both as an employee and now. I wouldn’t say I’m confident in my understanding of any of it. I was somewhat sad about Jack and Dario’s public comments about thinking it’s too early to regulate (if I understood them correctly), which I also found surprising as I thought they had fairly short timelines, but policy is not at all my area of expertise so I am not confident in this take.
    
    I think it’s totally plausible Anthropic has net negative impact, but the same is true for almost any significant actor in a complex situation. I agree that policy is one such way that their impact could be negative, though I’d generally bet Anthropic will push more for policies I personally support than any other lab, even if they may not push as much as I want them to.
    - Akash 12 Jul 2024 21:40 UTC
      38 points
      35
      Parent
      I’m a bit worried about a dynamic where smart technical folks end up feeling like “well, I’m kind of disappointed in Anthropic’s comms/policy stuff from what I hear, and I do wish they’d be more transparent, but policy is complicated and I’m not really a policy expert”.
      To be clear, this is a quite reasonable position for any given technical researcher to have– the problem is that this provides pretty little accountability. In a world where Anthropic was (hypothetically) dishonest, misleading, actively trying to undermine/weaken regulations, or putting its own interests above the interests of the “commons”, it seems to me like many technical researchers (even Anthropic staff) would not be aware of this. Or they might get some negative vibes but then slip back into a “well, I’m not a policy person, and policy is complicated” mentality.
      I’m not saying there’s even necessarily a strong case that Anthropic is trying to sabotage policy efforts (though I am somewhat concerned about some of the rhetoric Anthropic uses, public comments about thinking its too early to regulate, rumors that they have taken actions to oppose SB 1047, and a lack of any real “positive” signals from their positive team like EG recommending or developing policy proposals that go beyond voluntary commitments or encouraging people to measure risks.)
      But I think once upon a time there was some story that if Anthropic defected in major ways, a lot of technical researchers would get concerned and quit/whistleblow. I think Anthropic’s current comms strategy, combined with the secrecy around a lot of policy things, combined with a general attitude (whether justified or unjustified) of “policy is complicated and I’m a technical person so I’m just going to defer to Dario/Jack” makes me concerned that safety-concerned people won’t be able to hold Anthropic accountable even if it actively sabotages policy stuff.
      I’m also not really sure if there’s an easy solution to this problem, but I do imagine part of the solution involves technical people (especially at Anthropic) raising questions, asking people like Jack and Dario to explain their takes more, and being more willing to raise public & private discussions about Anthropic’s role in the broader policy space.
    - simeon_c 13 Jul 2024 15:12 UTC
      16 points
      13
      Parent
      Thanks for answering, that’s very useful.
      My concern is that as far as I understand, a decent number of safety researchers are thinking that policy is the most important area, but because, as you mentioned, they aren’t policy experts and don’t really know what’s going on, they just assume that Anthropic policy work is way better than those actually working in policy judge it to be. I’ve heard from a surprisingly high number of people among the orgs that are doing the best AI policy work that Anthropic policy is mostly anti-helpful.
      Somehow though, internal employees keep deferring to their policy team and don’t update on that part/take their beliefs seriously.
      I’d generally bet Anthropic will push more for policies I personally support than any other lab, even if they may not push as much as I want them to.
      If it’s true, it is probably true to an epsilon degree, and it might be wrong because of weird preferences of a non-safety industry actor. AFAIK, Anthropic has been pushing against all the AI regulation proposals to date. I’ve still to hear a positive example.
    - Akash 12 Jul 2024 21:45 UTC
      10 points
      1
      Parent
      Separately, while I think the discussion around “is X net negative” can be useful, I think it ends up implicitly putting the frame on “can X justify that they are not net negative.”
      I suspect the quality of discourse– and society’s chances to have positive futures– would improve if the frame were more commonly something like “what are the best actions for X to be taken” or “what are reasonable/high-value things that X could be doing.”
      And I think it’s valid to think “X is net positive” while also thinking “I feel disappointed in X because I don’t think it’s using its power/resources in ways that would produce significantly better outcomes.”
      IDK what the bar should be for considering X a “responsible actor”, but I imagine my personal bar is quite a bit higher than “(barely) net positive in expectation.”
      P.S. Both of these comments are on the opinionated side, so separately, I just wanted to say thank you Neel for speaking up & for offering your current takes on Anthropic. Strong upvoted!
Neel Nanda 5 Dec 2024 19:14 UTC
53 points
15
A tip for anyone on the ML job/PhD market—people will plausibly be quickly skimming your google scholar to get a sense of “how impressive is this person/what is their deal” read (I do this fairly often), so I recommend polishing your Google scholar if you have publications! It can make a big difference.

I have a lot of weird citable artefacts that confuse Google Scholar, so here’s some tips I’ve picked up:
- First, make a google scholar profile if you don’t already have one!
  - Verify the email (otherwise it doesn’t show up properly in search)
- (Important!) If you are co-first author on a paper but not in the first position, indicate this by editing the names of all co-first authors to end in a *
  - You edit by logging in to the google account you made the profile with, going to your profile, clicking on the paper’s name, and then editing the author’s names
  - Co-first vs second author makes a big difference to how impressive a paper is, so you really want this to be clear!
- Edit the venue of your work to be the most impressive place it was published, and include any notable awards from the venue (eg spotlight, oral, paper awards, etc).
  - You can edit this by clicking on the paper name and editing the journal field.
  - If it was a workshop, make sure you include the word workshop (otherwise it can appear deceptive).
  - See my profile for examples.
- Hunt for lost citations: Often papers have weirdly formatted citations and Google scholar gets confused and thinks it was a different paper. You can often find these by clicking on the plus just below your profile picture then add articles, and then clicking through the pages for anything that you wrote. Add all these papers, and then use the merge function to combine them into one paper (with a combined citation count).
  - Merge lets you choose which of the merged artefacts gets displayed
  - Merge = return to the main page, click the tick box next to the paper titles, then clicking merge at the top
  - Similar advice applies if you have eg a blog post that was later turned into a paper, and have citations for both
  - Another merging hack, if you have a weird artefact on your google scholar (eg a blog post or library) and you don’t like how Google scholar thinks it should be presented, you can manually add the citation in the format you like, and then merge this with the existing citation, and display your new one
- If you’re putting citations on a CV, semantic scholar is typically better for numbers, as it updates more frequently than Google scholar. Though it’s worse at picking up on the existence of non paper artefacts like a cited Github or blog post
- Make your affiliation/title up to date at the top