Raemon comments on Mark Xu’s Shortform

Raemon 10 Oct 2024 19:19 UTC
LW: 9 AF: 6
1
AF
I think two major cruxes for me here are:
- is it actually tractable to affect Deepmind’s culture and organizational decisionmaking
- how close to the threshold is Anthropic for having a good enough safety culture?
My current best guess is that Anthropic is still under the threshold for good enough safety culture (despite seeming better than I expected in a number of ways), and meanwhile that Deepmind is just too intractably far gone.
I think people should be hesitant to work at any scaling lab, but, I think Anthropic might be possible to make “the one actually good scaling lab”, and I don’t currently expect that to be tractable at Deepmind and I think “having at least one” seems good for the world (although it’s a bit hard for me to articulate why at the moment)
I am interested in hearing details about Deepmind that anyone thinks should change my mind about this.
This viewpoint is based on having spent at least 10s of hours trying to learn and about influence both org’s culture, at various times.
In both cases, I don’t get the sense that people at the orgs really have a visceral sense that “decisionmaking processes can be fake”, I think they will be fake by default and the org is better modeled as following general incentives, and DeepMind has too many moving people and moving parts at a low enough density that it doesn’t seem possible to fix. For me to change my mind about this, I would need to someone there to look me in the eye and explain that they do have a visceral sense of how organizational decisionmaking processes can be fake, and why they nonetheless think DeepMind is tractable to fix. I assume it’s hard for @Rohin Shah and @Neel Nanda can’t really say anything publicly that’s capable of changing my mind for various confidentiality and political reasons, but, like, that’s my crux.
(conving me in more general terms “Ray, you’re too pessimistic about org culture” would hypothetically somehow work, but, you have a lot of work to do given how thoroughly those pessimistic predictions came true about OpenAi)
I think Anthropic also has this problem, but the threshold of almost-aligned-leadership and actually-pretty-aligned people that it feels at least possible to me for the to fix it. The main things that would persuade me that they are over the critical threshold is if they publicly spent social capital on clearly spelling out why the x-risk problem is hard, and made explicit plans to not merely pause for a bit when they hit an RSP threshold, but (at least in some circumstances) advocate strongly for global government shutdown for like 20+ years.
- Nathan Helm-Burger 14 Oct 2024 17:13 UTC
  2 points
  0
  Parent
  I think your pessimism of org culture is pretty relevant for the question of big decisions that GDM may make, but I think there is absolutely still a case to be made for the value of alignment research conducted wherever. If the research ends up published, then the origin shouldn’t be held too much against it.
  
  So yes, having a few more researchers at GDM doesn’t solve the corporate race problem, but I don’t think it worsens it either.
  
  As for pausing, I think it’s a terrible idea. I’m pretty confident that any sort of large scale pause would be compute threshold focused, and would be worse than not pausing because it would shift research pressure towards algorithmic efficiency. More on that here: https://www.lesswrong.com/posts/Kobbt3nQgv3yn29pr/my-theory-of-change-for-working-in-ai-healthtech?commentId=qwixG4xYeFdELb2GJ
  - Raemon 14 Oct 2024 17:29 UTC
    4 points
    0
    Parent
    It might be “fine” to do research at GDM (depending on how free you are to actually pursue good research directions, or how good a mentor you have). But, part of the schema in Mark’s post is “where should one go for actively good second-order effects?”.