leogao comments on Reasons for and against working on technical AI safety at a frontier AI lab

leogao 5 Jan 2025 19:07 UTC
36 points
15
some random takes:
- you didn’t say this, but when I saw the infrastructure point I was reminded that some people seem to have a notion that any ML experiment you can do outside a lab, you will be able to do more efficiently inside a lab because of some magical experimentation infrastructure or something. I think unless you’re spending 50% of your time installing cuda or something, this basically is just not a thing. lab infrastructure lets you run bigger experiments than you could otherwise, but it costs a few sanity points compared to the small experiment. oftentimes, the most productive way to work inside a lab is to avoid existing software infra as much as possible.
- I think safetywashing is a problem but from the perspective of an xrisky researcher it’s not a big deal because for the audiences that matter, there are safetywashing things that are just way cheaper per unit of goodwill than xrisk alignment work—xrisk is kind of weird and unrelatable to anyone who doesn’t already take it super seriously. I think people who work on non xrisk safety or distribution of benefits stuff should be more worried about this.
- this is totally n=1 and in fact I think my experience here is quite unrepresentative of the average lab experience, but I’ve had a shocking amount of research freedom. I’m deeply grateful for this—it has turned out to be incredibly positive for my research productivity (e.g the SAE scaling paper would not have happened otherwise).
- Lucius Bushnaq 5 Jan 2025 22:04 UTC
  18 points
  8
  Parent
  I think safetywashing is a problem but from the perspective of an xrisky researcher it’s not a big deal because for the audiences that matter, there are safetywashing things that are just way cheaper per unit of goodwill than xrisk alignment work—xrisk is kind of weird and unrelatable to anyone who doesn’t already take it super seriously. I think people who work on non xrisk safety or distribution of benefits stuff should be more worried about this.
  Weird it may be, but it is also somewhat influential among people who matter. The extended LW-sphere is not without influence and also contains good ml-talent for the recruiting pool. I can easily see the case that places like Anthropic/Deepmind/OpenAI^[1] benefit from giving the appearance of caring about xrisk and working on it.
  1. ^
    until recently
- Sheikh Abdur Raheem Ali 6 Jan 2025 5:03 UTC
  3 points
  −3
  Parent
  (responding only to the first point)
  It is possible to do experiments more efficiently in a lab because you have privileged access to top researchers whose bandwidth is otherwise very constrained. If you ask for help in Slack, the quality of responses tends to be comparable to teams outside labs, but the speed is often faster because the hiring process selects strongly for speed. It can be hard to coordinate busy schedules, but if you have a collaborator’s attention, what they say will make sense and be helpful. People at labs tend to be unusually good communicators, so it is easier to understand what they mean during meetings, whiteboard sessions, or 1:1s. This is unfortunately not universal amongst engineers. It’s also rarer for projects to be managed in an unfocused way leading to them fizzling out without adding value, and feedback usually leads to improvement rather than deadlock over disagreements.
  Also, lab culture in general benefits from high levels of executive function. For instance, when a teammate says they spent an hour working on a document, you can be confident that progress has been made even if not all changes pass review. It’s less likely that they suffered from writer’s block or got distracted by a lower priority task. Some of these factors also apply at well-run startups, but they don’t have the same branding, and it’d be difficult for a startup to e.g line up four reviewers of this calibre: https://assets.anthropic.com/m/24c8d0a3a7d0a1f1/original/Alignment-Faking-in-Large-Language-Models-reviews.pdf.
  I agree that (without loss of generality) the internal RL code isn’t going to blow open source repos out of the water, and if you want to iterate on a figure or plot, that’s the same amount of work no matter where you are even if you have experienced people helping you make better decisions. But you’re missing that lab infra doesn’t just let you run bigger experiments, it also lets you run more small experiments, because resourcing for compute/researcher at labs is quite high by non-lab standards. When I was at Microsoft, it wasn’t uncommon for some teams to have the equivalent of roughly 2 V100s, which is less than what students can rent from vast or runpod for personal experiments.
  - leogao 7 Jan 2025 2:36 UTC
    3 points
    0
    Parent
    I agree that labs have more compute and more top researchers, and these both speed up research a lot. I disagree that the quality of responses is the same as outside labs, if only because there is lots of knowledge inside labs that’s not available elsewhere. I think these positive factors are mostly orthogonal to the quality of software infrastructure.