Safety consultations for AI lab employees

Zach Stein-PerlmanJul 27, 2024, 3:00 PM

181 points

Many people who are concerned about AI x-risk work at AI labs, in the hope of doing directly useful work, boosting a relatively responsible lab, or causing their lab to be safer on the margin.

Labs do lots of stuff that affects AI safety one way or another. It would be hard enough to follow all this at best; in practice, labs are incentivized to be misleading in both their public and internal comms, making it even harder to follow what’s happening. And so people end up misinformed about what’s happening, often leading them to make suboptimal choices.

In my AI Lab Watch work, I pay attention to what AI labs do and what they should do. So I’m in a good position to inform interested but busy people.

So I’m announcing an experimental service where I provide the following:

Calls for current and prospective employees of frontier AI labs.
- Book here
- On these (confidential) calls, I can answer your questions about frontier AI labs’ current safety-relevant actions, policies, commitments, and statements, to help you to make more informed choices.
- These calls are open to any employee of OpenAI, Anthropic, Google DeepMind, Microsoft AI, or Meta AI, or to anyone who is strongly considering working at one (with an offer in hand or expecting to receive one).
- If that isn’t you, feel free to request a call and I may still take it.
Support for potential whistleblowers. If you’re at a lab and aware of wrongdoing, I can put you in touch with:
- Former lab employees and others who can offer confidential advice
- Vetted employment lawyers
- Communications professionals who can advise on talking to the media.
If you need this, email zacharysteinperlman at gmail or message me on Signal at 734 353 3975.

I don’t know whether I’ll offer this long-term. I’m going to offer this for at least the next month.

My hope is that this service makes it much easier for lab employees to have an informed understanding of labs’ safety-relevant actions, commitments, and responsibilities.

If you want to help—e.g. if maybe I should introduce lab-people to you—let me know.

You can give me anonymous feedback.

Crossposted from AI Lab Watch. Subscribe on Substack.

Zach Stein-PerlmanJul 27, 2024, 3:00 PM

181 points

4 comments1 min readLW link

Community AI

orthonormal Jul 28, 2024, 3:07 AM
22 points
10

Can you share any strong evidence that you’re an unusually trustworthy person in regard to confidential conversations? People would in fact be risking a lot by talking to you.
(This is sincere btw; I think this service should absolutely exist, but the best version of it is probably done by someone with a longstanding public reputation of circumspection.)
- Buck Jul 28, 2024, 3:45 AM
  24 points
  16
  Parent
  
  I trust Zach a lot and would be shocked if he maliciously or carelessly leaked info. I don’t believe he’s very experienced at handling confidential information, but I expect him to seek out advice as necessary and overall do a good job here. Happy to say more to anyone interested.
- Zach Stein-Perlman Jul 28, 2024, 4:20 AM
  12 points
  0
  Parent
  
  Good question.
  I can’t really make this legible, no.
  On the whistleblowing part, you should be able to get good advice without trusting me. It’s publicly known that Kelsey Piper plus iirc one or two of the ex-OpenAI folks are happy to talk to potential whistleblowers. I should figure out exactly who that is and put their (publicly verifiable) contact info in this post (and, note to self, clarify whether or in-what-domains I endorse their advice vs merely want to make salient that it’s available). Thanks.
  [Oh, also ideally maybe I’d have a real system for anonymous communication.]
  (On my-takes-on-lab-safety-stuff, it’s harder to substitute for talking-to-me but it’s much less risky; presumably talking to people-outside-the-lab about safety stuff is normal.)
Review Bot Jul 28, 2024, 4:37 AM
−2 points
0

The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.
Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?