Richard_Ngo comments on Towards more cooperative AI safety strategies

Richard_Ngo 23 Jul 2024 16:13 UTC
4 points
0
But I also think that one of the reasons why Richard still works at OpenAI is because he’s the kind of agent who genuinely believes things that tend to be pretty aligned with OpenAI’s interests, and I suspect his perspective is informed by having lots of friends/colleagues at OpenAI.
Added a disclaimer, as suggested. It seems like a good practice for this sort of post. Though note that I disagree with this paragraph; I don’t think “being the kind of agent who X” or “being informed by many people at Y” are good reasons to give disclaimers. Whereas I do buy that “they filter out any ideas that they have that could get them in trouble with the company” is an important (conscious or unconscious) effect, and worth a disclaimer.
I’ve also added this note to the text:
Note that most big companies (especially AGI companies) are strongly structurally power-seeking too, and this is a big reason why society at large is so skeptical of and hostile to them. I focused on AI safety in this post both because companies being power-seeking is an idea that’s mostly “priced in”, and because I think that these ideas are still useful even when dealing with other power-seeking actors.
- Adam Scholl 24 Jul 2024 0:11 UTC
  42 points
  42
  Parent
  I appreciate you adding the note, though I do think the situation is far more unusual than described. I agree it’s widely priced in that companies in general seek power, but I think probably less so that the author of this post personally works for a company which is attempting to acquire drastically more power than any other company ever, and that much of the behavior the post describes as power-seeking amounts to “people trying to stop the author and his colleagues from attempting that.”
- Akash 23 Jul 2024 17:44 UTC
  4 points
  1
  Parent
  Thanks!
  (I think “being the kind of agent who survives the selection process” can sometimes be an important epistemic thing to consider, though mostly when thinking about how systems work and what kinds of people/views those systems promote. Agreed that “being informed by many people who Y” is a rather weak one & certainly would not on its own warrant a disclosure.)
  - Richard_Ngo 23 Jul 2024 18:29 UTC
    2 points
    1
    Parent
    I think “being the kind of agent who survives the selection process” can sometimes be an important epistemic thing to consider
    I’m not claiming it’s zero information, but there are lots of things that convey non-zero information which it’d be bad to set disclosure norms based on. E.g. “I’ve only ever worked at nonprofits” should definitely affect your opinion of someone’s epistemics (e.g. when they’re trying to evaluate corporate dynamics) but once we start getting people to disclose that sort of thing there’s no clear stopping point. So mostly I want the line to be “current relevant conflicts of interest”.
    - Raemon 23 Jul 2024 18:43 UTC
      3 points
      0
      Parent
      My take atm is “seems right that this shouldn’t be a permanent norm, there are definitely costs of disclaimer-ratcheting that are pretty bad. I think it might still be the right thing to do of your own accord in some cases, which is, like, superogetory.”
      I think there’s maybe a weird thing with this post, where, it’s trying to be the timeless, abstract version of itself. It’s certainly easier to write the timeless abstract version than the “digging into specific examples and calling people out” version. But, I think the digging into specific examples is actually kind of important here – it’s easy to come away with vague takeaways that disagree, where everyone nods along but then mostly thinks it’s Those Other Guys who are being power seeking.
      Given that it’s probably 10-50x harder to write the Post With Specific Examples, I think actually a pretty okay outcome is “ship the vague post, and let discussion in the comments get into the inside-baseball-details.” And, then, it’d be remiss for the post-author’s role in the ecosystem not coming up as an example to dig into.