habryka comments on evhub’s Shortform

habryka Dec 28, 2024, 7:52 PM
LW: 78 AF: 30
54
AF
Sure, here are some things:
- Anthropic should publicly clarify the commitments it made on not pushing the state of the art forward in the early years of the organization.
- Anthropic should appoint genuinely independent members to the Long Term Benefit Trust, and should ensure the LTBT is active and taking its role of supervision seriously.
- Anthropic should remove any provisions that allow shareholders to disempower the LTBT
- Anthropic should state openly and clearly that the present path to AGI presents an unacceptable existential risk and call for policymakers to stop, delay or hinder the development of AGI
- Anthropic should publicly state its opinions on what AGI architectures or training processes it considers more dangerous (like probably long-horizon RL training), and either commit to avoid using those architectures and training-processes, or at least very loudly complain that the field at large should not use those architectures
- Anthropic should not ask employees or contractors to sign non-disparagement agreements with Anthropic, especially not self-cloaking ones
- Anthropic should take a humanist/cosmopolitan stance on risks from AGI in which risks related to different people having different values are very clearly deprioritized compared to risks related to complete human disempowerment or extinction, as worry about the former seems likely to cause much of the latter
- Anthropic should do more collaborations like the one you just did with Redwood, where external contractors get access to internal models. I think this is of course infosec-wise hard, but I think you can probably do better than you are doing right now.
- Anthropic should publicly clarify what the state of its 3:1 equity donation matching program is, which it advertised publicly (and which played a substantial role in many people external to Anthropic supporting it, given that they expected a large fraction of the equity to therefore be committed to charitable purposes). Recent communications suggest any equity matching program at Anthropic does not fit what was advertised.
I can probably think of some more.
What links here?
- AI #97: 4 by Zvi (Jan 2, 2025, 2:10 PM; 45 points)
- TsviBT Dec 28, 2024, 8:25 PM
  LW: 17 AF: 8
  10
  AF Parent
  I’d add:
  - Support explicit protections for whistleblowers.
  Anthropic should state openly and clearly that the present path to AGI presents an unacceptable existential risk and call for policymakers to stop, delay or hinder the development of AGIwhi
  
  I’ll echo this and strengthen it to:
  
  … call for policymakers to stop the development of AGI.
- Neel Nanda Dec 28, 2024, 9:24 PM
  11 points
  2
  Parent
  I gather that they changed the donation matching program for future employees, but the 3:1 match still holds for prior employees, including all early employees (this change happened after I left, when Anthropic was maybe 50 people?)
  
  I’m sad about the change, but I think that any goodwill due to believing the founders have pledged much of their equity to charity is reasonable and not invalidated by the change
  - habryka Dec 28, 2024, 9:30 PM
    9 points
    3
    Parent
    If it still holds for early employees that would be a good clarification and totally agree with you that if that is the case, I don’t think any goodwill was invalidated! That’s part why I was asking for clarification. I (personally) wouldn’t be surprised if this had also been changed for early employees (and am currently close to ⁵⁰⁄₅₀ on that being the case).
    - Sam Marks Dec 29, 2024, 4:46 PM
      12 points
      5
      Parent
      The old 3:1 match still applies to employees who joined prior to May/June-ish 2024. For new joiners it’s indeed now 1:1 as suggested by the Dario interview you linked.
      - habryka Dec 29, 2024, 5:59 PM
        4 points
        2
        Parent
        That’s great to hear, thank you for clarifying!
    - Neel Nanda Dec 29, 2024, 7:46 PM
      4 points
      0
      Parent
      I would be very surprised if it had changed for early employees. I considered the donation matching part of my compensation package (it 2.5x the amount of equity, since it was a 3:1 match on half my equity), and it would be pretty norm violating to retroactively reduce compensation
      - habryka Dec 29, 2024, 7:57 PM
        4 points
        0
        Parent
        If it had happened I would have expected that it would have been negotiated somehow with early employees (in a way that they agreed to, but not necessarily any external observers).
        But seems like it is confirmed that that early matching is indeed still active!
        Adam Jermyn Dec 30, 2024, 4:59 AM
        10 points
        5
        Parent
        I can also confirm (I have a 3:1 match).
- yams Dec 29, 2024, 11:40 PM
  1 point
  0
  Parent
  Anthropic should take a humanist/cosmopolitan stance on risks from AGI in which risks related to different people having different values are very clearly deprioritized compared to risks related to complete human disempowerment or extinction, as worry about the former seems likely to cause much of the latter
  
  Can you say more about the section I’ve bolded or link me to a canonical text on this tradeoff?
  - habryka Dec 29, 2024, 11:47 PM
    18 points
    8
    Parent
    OpenAI, Anthropic, and xAI were all founded substantially because their founders were worried that other people would get to AGI first, and then use that to impose their values on the world.
    In-general, if you view developing AGI as a path to godlike-power (as opposed to a doomsday device that will destroy most value independently of who gets their first), it makes a lot of sense to rush towards it. As such, the concern that people will “do bad things with the AI that they will endorse, but I won’t” is the cause of a substantial fraction of worlds where we recklessly race past the precipice.
    - yams Dec 29, 2024, 11:54 PM
      3 points
      0
      Parent
      Thanks for the clarification — this is in fact very different from what I thought you were saying, which was something more like “FATE-esque concerns fundamentally increase x-risk in ways that aren’t just about (1) resource tradeoffs or (2) side-effects of poorly considered implementation details.”
      - habryka Dec 30, 2024, 12:06 AM
        6 points
        2
        Parent
        I mean, it’s related. FATE stuff tends to center around misuse. I think it makes sense for organizations like Anthropic to commit to heavily prioritize accident risk over misuse risk, since most forms of misuse risk mitigation involve getting involved in various more zero-sum-ish conflicts, and it makes sense for there to be safety-focused institutions that are committed to prioritizing the things that really all stakeholders can agree on are definitely bad, like human extinction or permanent disempowerment.