I agree that in a fast takeoff scenario there’s little reason for an AI system to operate withing existing societal structures, as it can outgrow them quicker than society can adapt. I’m personally fairly skeptical of fast takeoff (<6 months say) but quite worried that society may be slow enough to adapt that even years of gradual progress with a clear sign that transformative AI is on the horizon may be insufficient.
In terms of humans “owning” the economy but still having trouble getting what they want, it’s not obvious this is a worse outcome than the society we have today. Indeed this feels like a pretty natural progression of human society. Humans already interact with (and not so infrequently get tricked or exploited by) entities smarter than them such as large corporations or nation states. Yet even though I sometimes find I’ve bought a dud on the basis of canny marketing, overall I’m much better off living in a modern capitalist economy than the stone age where humans were more directly in control.
However, it does seem like there’s a lot of value lost in the scenario where humans become increasingly disempowered, even if their lives are still better than in 2022. From a total utilitarian perspective, “slightly better than 2022” and “all humans dead” are rounding errors relative to “possible future human flourishing”. But things look quite different under other ethical views, so I’m reluctant to conflate these outcomes.
I’m excited by many of the interventions you describe but largely for reasons other than buying time. I’d expect buying time to be quite hard, in so far as it requires coordinating to prevent many actors from stopping doing something they’re incentivized to do. Whereas since alignment research community is small, doubling it is relatively easy. However, it’s ultimately a point in favor of the interventions that they look promising under multiple worldviews, but it might lead me to prioritize within them differently to you.
One area I would push back on is the skills you describe as being valuable for “buying time” seem like a laundry list for success in research in general, especially empirical ML research:
It seems pretty bad for the people strongest at empirical ML research to stop doing alignment research. Even if we pessimistically assume that empirical research now is useless (which I’d strongly disagree with), surely we need excellent empirical ML researchers to actually implement the ideas you hope the people who can “generate and formalize novel ideas” come up with. There are a few aspects of this (like communication skills) that do seem to differentially point in favor of “buying time”, maybe have a shorter and more curated list in future?
Separately given your fairly expansive list of things that “buy time” I’d have estimated that close to 50% of the alignment community are already doing this—even if they believe their primary route to impact is more direct. For example, I think most people working on safety at AGI labs would count under your definition: they can help convince decision-makers in the lab not to deploy unsafe AI systems, buying us time. A lot of the work on safety benchmarks or empirical demonstrations of failure modes falls into this category as well. Personally I’m concerned people are falling into this category of work by default and that there’s too much of this, although I do think when done well it can be very powerful.