AdamGleave comments on Instead of technical research, more people should focus on buying time

AdamGleave 7 Nov 2022 5:41 UTC
9 points
6
I’m excited by many of the interventions you describe but largely for reasons other than buying time. I’d expect buying time to be quite hard, in so far as it requires coordinating to prevent many actors from stopping doing something they’re incentivized to do. Whereas since alignment research community is small, doubling it is relatively easy. However, it’s ultimately a point in favor of the interventions that they look promising under multiple worldviews, but it might lead me to prioritize within them differently to you.

One area I would push back on is the skills you describe as being valuable for “buying time” seem like a laundry list for success in research in general, especially empirical ML research:

Skills that seem uniquely valuable for buying time interventions: general researcher aptitudes, ability to take existing ideas and strengthen them, experimental design skills, ability to iterate in response to feedback, ability to build on the ideas of others, ability to draw connections between ideas, experience conducting “typical ML research,” strong models of ML/capabilities researchers, strong communication skills

It seems pretty bad for the people strongest at empirical ML research to stop doing alignment research. Even if we pessimistically assume that empirical research now is useless (which I’d strongly disagree with), surely we need excellent empirical ML researchers to actually implement the ideas you hope the people who can “generate and formalize novel ideas” come up with. There are a few aspects of this (like communication skills) that do seem to differentially point in favor of “buying time”, maybe have a shorter and more curated list in future?

Separately given your fairly expansive list of things that “buy time” I’d have estimated that close to 50% of the alignment community are already doing this—even if they believe their primary route to impact is more direct. For example, I think most people working on safety at AGI labs would count under your definition: they can help convince decision-makers in the lab not to deploy unsafe AI systems, buying us time. A lot of the work on safety benchmarks or empirical demonstrations of failure modes falls into this category as well. Personally I’m concerned people are falling into this category of work by default and that there’s too much of this, although I do think when done well it can be very powerful.