Remmelt comments on Why I think it’s net harmful to do technical safety research at AGI labs

Remmelt 7 Feb 2024 7:30 UTC
9 points
1
Someone asked:

“Why would having [the roles] be filled by someone in EA be worse than a non EA person? can you spell this out for me? I.e. are EA people more capable? would it be better to have less competent people in such roles? not clear to me that would be better”

Here was my response:

So I was thinking about this.

Considering this as an individual decision only can be limiting. Even 80k staff have acknowledged that sometimes you need a community to make progress on something.

For similar reasons, protests work better if there are multiple people showing up.

What would happen if 80k and other EA organisations stopped recommending positions at AGI labs and actually honestly point out that work at these labs turned out to be bad – because it has turned out the labs have defected on their end of the bargain and don’t care enough about getting safety right..?

It would make an entire community of people become aware that we may need to actively start restricting this harmful work. Instead, what we’ve been seeing is EA orgs singing praise for AGI lab leaders for years, and 80k still recommending talented idealistic people join AGI labs. I’d rather see less talented sketchy-looking people join the AGI labs.

I would rather see everyone in the AI Safety to become more clear to each other and to the public that we are not condoning harmful automation races to the bottom. We’re not condoning work at these AGI labs and we are no longer giving our endorsement to it.
What links here?
- Remmelt's comment on Why I think it’s net harmful to do technical safety research at AGI labs by Remmelt (EA Forum; 8 Feb 2024 4:19 UTC; 1 point)
- Remmelt 7 Feb 2024 8:32 UTC
  10 points
  5
  Parent
  Their question was also responding to my concerns on how 80,000 Hours handpicks jobs at AGI labs.
  
  Some of those advertised jobs don’t even focus on safety – instead they look like policy lobbying roles or engineering support roles.
  
  Nine months ago, I wrote this email to 80k staff:
  Hi [x, y, z]
  I noticed the job board lists positions at OpenAI and AnthropicAI under the AI Safety category:
  
  Not sure whom to contact, so I wanted to share these concerns with each of you:
  Capability races
  OpenAI’s push for scaling the size and applications of transformer-network-based models has led Google and others to copy and compete with them.
  Anthropic now seems on a similar trajectory.
  By default, these should not be organisations supported by AI safety advisers with a security mindset.
  No warning
  Job applicants are not warned of the risky past behaviour by OpenAI and Anthropic. Given that 80K markets to a broader audience, I would not be surprised if 50%+ are not much aware of the history. The subjective impression I get is that taking the role will help improve AI safety and policy work.
  At the top of the job board, positions are described as “Handpicked to help you tackle the world’s most pressing problems with your career.”
  If anything, “About this organisation” makes the companies look more comprehensively careful about safety than they really have acted like:
  “Anthropic is an AI safety and research company that’s working to build reliable, interpretable, and steerable AI systems.”
  “OpenAI is an AI research and deployment company, with roles working on AI alignment & safety.”
  It is understandable that people aspiring for AI safety & policy careers are not much aware, and therefore should be warned.
  However, 80K staff should be tracking the harmful race dynamics and careless deployment of systems by OpenAI, and now Anthropic.
  The departure of OpenAI’s safety researchers was widely known, and we have all been tracking the hype cycles around ChatGPT.
  Various core people in the AI Safety community have mentioned concerns about Anthropic.
  Oliver Habryka mentions this as part of the reasoning for shutting down the LightCone offices:
  I feel quite worried that the alignment plan of Anthropic currently basically boils down to “we are the good guys, and by doing a lot of capabilities research we will have a seat at the table when AI gets really dangerous, and then we will just be better/more-careful/more-reasonable than the existing people, and that will somehow make the difference between AI going well and going badly”. That plan isn’t inherently doomed, but man does it rely on trusting Anthropic’s leadership, and I genuinely only have marginally better ability to distinguish the moral character of Anthropic’s leadership from the moral character of FTX’s leadership, and in the absence of that trust the only thing we are doing with Anthropic is adding another player to an AI arms race.
  More broadly, I think AI Alignment ideas/the EA community/the rationality community played a pretty substantial role in the founding of the three leading AGI labs (Deepmind, OpenAI, Anthropic), and man, I sure would feel better about a world where none of these would exist, though I also feel quite uncertain here. But it does sure feel like we had a quite large counterfactual effect on AI timelines.
  Not safety focussed
  Some jobs seem far removed from positions of researching (or advising on restricting) the increasing harms of AI-system scaling.
  For OpenAI:
  IT Engineer, Support: “The IT team supports Mac endpoints, their management tools, local network, and AV infrastructure”
  Software Engineer, Full-Stack: “to build and deploy powerful AI systems and products that can perform previously impossible tasks and achieve unprecedented levels of performance.”
  For Anthropic:
  Technical Product Manager: “Rapidly prototype different products and services to learn how generative models can help solve real problems for users.”
  Prompt Engineer and Librarian: “Discover, test, and document best practices for a wide range of tasks relevant to our customers.”
  Align-washing
  Even if an accepted job applicant get to be in a position of advising on and restricting harmful failure modes, how do you trade this off against:
  the potentially large marginal relative difference in skills of top engineering candidates you sent OpenAI’s and Anthropic’s way, and are accepted to do work for scaling their technology stack?
  how these R&D labs will use the alignment work to market the impression that they are safety-conscious, to:
  avoid harder safety mandates (eg. document their copyrights-infringing data, don’t allow API developers to deploy spaghetti code all over the place)?
  attract other talented idealistic engineers and researchers?
  and so on?
  I’m confused and, to be honest, shocked that these positions are still listed for R&D labs heavily invested in scaling AI system capabilities (without commensurate care for the exponential increase in the number of security gaps and ways to break our complex society and supporting ecosystem that opens up).I think this is pretty damn bad.
  Preferably, we can handle this privately and not make it bigger. If you can come back on these concerns in the next two weeks, I would very much appreciate that.
  
  If not, or not sufficiently addressed, I hope you understand that I will share these concerns in public.
  
  Warm regards,
  
  Remmelt
  - Remmelt 7 Feb 2024 8:32 UTC
    0 points
    0
    Parent
    80k removed one of the positions I flagged: Software Engineer, Full-Stack, Human Data Team (reason given: it looked potentially more capabilities-focused than the original job posting that came into their system).
    
    For the rest, little has changed:
    80k still lists jobs that help AGI labs scale commercially,
    Jobs with similar names:
    research engineer product, prompt engineer, IT support, senior software engineer.
    80k still describes these jobs as “Handpicked to help you tackle the world’s most pressing problems with your career.”
    80k still describes Anthropic as “an Al safety and research company that’s working to build reliable, interpretable, and steerable Al systems”.
    80k staff still have not accounted for that >50% of their broad audience checking 80k’s handpicked jobs are not much aware of the potential issues of working at an AGI lab.
    Readers there don’t get informed. They get to click on the button ‘VIEW JOB DETAILS’ , taking them straight to the job page. From there, they can apply and join the lab unprepared.
    
    Two others in AI Safety also discovered the questionable job listings. They are disappointed in 80k.
    
    Feeling exasperated about this. Thinking of putting out another post just to discuss this issue.
    - Benjamin Hilton 7 Feb 2024 19:15 UTC
      3 points
      −2
      Parent
      [x-posted from EA forum]
      
      Hi Remmelt,
      Thanks for sharing your concerns, both with us privately and here on the forum. These are tricky issues and we expect people to disagree about how to about how to weigh all the considerations — so it’s really good to have open conversations about them.
      Ultimately, we disagree with you that it’s net harmful to do technical safety research at AGI labs. In fact, we think it can be the best career step for some of our readers to work in labs, even in non-safety roles. That’s the core reason why we list these roles on our job board.
      We argue for this position extensively in my article on the topic (and we only list roles consistent with the considerations in that article).
      Some other things we’ve published on this topic in the last year or so:
      A range of opinions from anonymous experts about the upsides and downsides of working on AI capabilities
      How policy roles in AI companies can be valuable for career capital and for direct impact (as well as the potential downsides)
      We recently released a podcast episode with Nathan Labenz on some of the controversy around OpenAI, including his concerns about some of their past safety practices, whether ChatGPT’s release was good or bad, and why its mission of developing AGI may be too risky.
      Benjamin
      - yanni kyriacos 8 Feb 2024 0:45 UTC
        8 points
        4
        Parent
        Hi Benjamin—would be interested in your take on a couple of things:
        
        1. By recommending people work at big labs, do you think this has a positive Halo Effect for the labs’ brand? I.e. 80k is known for wanting people to do good in the world, so by recommending people invest their careers at a lab, then those positive brand associations get passed onto the lab (this is how most brand partnerships work).
        
        2. If you think the answer to #1 is Yes, then do you believe the cost of this Halo Effect is outweighed by the benefit of having safety minded EA / Rationalist folk inside big labs?
      - Remmelt 8 Feb 2024 4:25 UTC
        6 points
        −1
        Parent
        [cross-posted replies from EA Forum]
        
        Ben, it is very questionable that 80k is promoting non-safety roles at AGI labs as ‘career steps’.
        
        Consider that your model of this situation may be wrong (account for model error).
        The upside is that you enabled some people to skill up and gain connections.
        The downside is that you are literally helping AGI labs to scale commercially (as well as indirectly supporting capability research).
        A range of opinions from anonymous experts about the upsides and downsides of working on AI capabilities
        I did read that compilation of advice, and responded to that in an email (16 May 2023):
        “Dear [a],
        
        People will drop in and look at job profiles without reading your other materials on the website. I’d suggest just writing a do-your-research cautionary line about OpenAI and Anthropic in the job descriptions itself.
        
        Also suggest reviewing whether to trust advice on whether to take jobs that contribute to capability research.
        Particularly advice by nerdy researchers paid/funded by corporate tech.
        Particularly by computer-minded researchers who might not be aware of the limitations of developing complicated control mechanisms to contain complex machine-environment feedback loops.
        Totally up to you of course.
        Warm regards,
        Remmelt”
        
        We argue for this position extensively in my article on the topic
        This is what the article says:
        “All that said, we think it’s crucial to take an enormous amount of care before working at an organisation that might be a huge force for harm. Overall, it’s complicated to assess whether it’s good to work at a leading AI lab — and it’ll vary from person to person, and role to role.”
        
        So you are saying that people are making a decision about working for an AGI lab that might be (or actually is) a huge force for harm. And that whether it’s good (or bad) to work at an AGI lab depends on the person – ie. people need to figure this out for them personally.
        
        Yet you are openly advertising various jobs at AGI labs on the job board. People are clicking through and applying. Do you know how many read your article beforehand?
        
        ~ ~ ~
        Even if they did read through the article, both the content and framing of the advice seems misguided. Noticing what is emphasised in your considerations.
        Here are the first sentences of each consideration section:
        (ie. as what readers are most likely to read, and what you might most want to convey).
        “We think that a leading — but careful — AI project could be a huge force for good, and crucial to preventing an AI-related catastrophe.”
        Is this your opinion about DeepMind, OpenAI and Anthropic?
        
        “Top AI labs are high-performing, rapidly growing organisations. In general, one of the best ways to gain career capital is to go and work with any high-performing team — you can just learn a huge amount about getting stuff done. They also have excellent reputations more widely. So you get the credential of saying you’ve worked in a leading lab, and you’ll also gain lots of dynamic, impressive connections.”
        Is this focussing on gaining prestige and (nepotistic) connections as an instrumental power move, with the hope of improving things later...?
        Instead of on actually improving safety?
        
        “We’d guess that, all else equal, we’d prefer that progress on AI capabilities was slower.”
        Why is only this part stated as a guess?
        I did not read “we’d guess that a leading but careful AI project, all else equal, could be a force of good”.
        Or inversely: “we think that continued scaling of AI capabilities could be a huge force of harm.”
        Notice how those framings come across very differently.
        Wait, reading this section further is blowing my mind.
        “But that’s not necessarily the case. There are reasons to think that advancing at least some kinds of AI capabilities could be beneficial. Here are a few”
        “This distinction between ‘capabilities’ research and ‘safety’ research is extremely fuzzy, and we have a somewhat poor track record of predicting which areas of research will be beneficial for safety work in the future. This suggests that work that advances some (and perhaps many) kinds of capabilities faster may be useful for reducing risks.”
        Did you just argue for working on some capabilities because it might improve safety? This is blowing my mind.
        “Moving faster could reduce the risk that AI projects that are less cautious than the existing ones can enter the field.”
        Are you saying we should consider moving faster because there are people less cautious than us?
        Do you notice how a similarly flavoured argument can be used by and is probably being used by staff at three leading AGI labs that are all competing with each other?
        Did OpenAI moving fast with ChatGPT prevent Google from starting new AI projects?
        “It’s possible that the later we develop transformative AI, the faster (and therefore more dangerously) everything will play out, because other currently-constraining factors (like the amount of compute available in the world) could continue to grow independently of technical progress.”
        How would compute grow independently of AI corporations deciding to scale up capability?
        The AGI labs were buying up GPUs to the point of shortage. Nvidia was not able to supply them fast enough. How is that not getting Nvidia and other producers to increase production of GPUs?
        More comments on the hardware overhang argument here.
        “Lots of work that makes models more useful — and so could be classified as capabilities (for example, work to align existing large language models) — probably does so without increasing the risk of danger”
        What is this claim based on?
        
        “As far as we can tell, there are many roles at leading AI labs where the primary effects of the roles could be to reduce risks.”
        As far as I can tell, this is not the case.
        For technical research roles, you can go by what I just posted.
        For policy, I note that you wrote the following:
        ”Labs also often don’t have enough staff… to figure out what they should be lobbying governments for (we’d guess that many of the top labs would lobby for things that reduce existential risks).”
        I guess that AI corporations use lobbyists for lobbying to open up markets for profit, and to not get actually restricted by regulations (maybe to move focus to somewhere hypothetically in the future, maybe to remove upstart competitors who can’t deal with the extra compliance overhead, but don’t restrict us now!).
        On prior, that is what you should expect, because that is what tech corporations do everywhere. We shouldn’t expect on prior that AI corporations are benevolent entities that are not shaped by the forces of competition. That would be naive.
        
        ~ ~ ~
        After that, there is a new section titled “How can you mitigate the downsides of this option?”
        That section reads as thoughtful and reasonable.
        How about on the job board, you link to that section in each AGI lab job description listed, just above the ‘VIEW JOB DETAILS’ button?
        For example, you could append and hyperlink ‘Suggestions for mitigating downsides’ to the organisational descriptions of Google DeepMind, OpenAI and Anthropic.
        That would help guide through potential applicants to AGI lab positions to think through their decision.