Here is a sceptical take: anyone who is prone to getting convinced by this post to switch to attempts at “buying time” interventions from attempts at do technical AI safety is pretty likely not a good fit to try any high-powered buying-time interventions.
The whole thing reads a bit like “AI governance” and “AI strategy” reinvented under a different name, seemingly without bothering to understand what’s the current understanding.
Figuring out that AI strategy and governance are maybe important, in late 2022, after spending substantial time on AI safety, does not seem to be what I would expect from promising AI strategists. Apparent lack of coordination with people already working in this area does not seem like a promising sign from people who would like to engage with hard coordination problems.
Also, I’m worried about suggestions like
Concretely, we think that roughly 80% of alignment researchers are working on directly solving alignment. We think that roughly 50% should be working on alignment, while 50% should be reallocated toward buying time.
We also think that roughly 90% of (competent) community-builders are focused on “typical community-building” (designed to get more alignment researchers). We think that roughly 70% should do typical community-building, and 30% should be buying time.
...could be easily counterproductive.
What is and would be really valuable are people who understand both the so-called “technical problem” and the so-called “strategy problem”. (secretly, they have more in common than people think)
What is not only not valuable, but easily harmful, would be an influx of people who understand neither, but engage with the strategy domain instead of technical domain.
The whole thing reads a bit like “AI governance” and “AI strategy” reinvented under a different name, seemingly without bothering to understand what’s the current understanding.
I overall agree with this comment, but do want to push back on this sentence. I don’t really know what it means to “invent AI governance” or “invent AI strategy”, so I don’t really know what it means to “reinvent AI governance” or “reinvent AI strategy”.
Separately, I also don’t really think it’s worth spending a ton of time trying to really understand what current people think about AI governance. Like, I think we are mostly confused, it’s a very green-field situation, and it really doesn’t seem to me to be the case that you have to read the existing stuff to helpfully contribute. Also a really large fraction of the existing stuff is actually just political advocacy dressed up as inquiry, and I think many people are better off not reading it (like, the number of times I was confused about a point of some AI governance paper and the explanation turned out to be “yeah, the author didn’t really believe this, but saying it would aim their political goals of being taken more seriously, or gaining reputation, or allowing them to later change policy in some different way” is substantially larger than the number of times I learned something helpful from these papers, so I do mostly recommend staying away from them).
I overall agree with this comment, but do want to push back on this sentence. I don’t really know what it means to “invent AI governance” or “invent AI strategy”, so I don’t really know what it means to “reinvent AI governance” or “reinvent AI strategy”.
By reinventing it, I means, for example, asking questions like “how to influence the dynamic between AI labs in a way which allows everyone to slow down at critical stage”, “can we convince some actors about AI risk without the main effect being they will put more resources into the race”, “what’s up with China”, “how to generally slow down things when necessary” and similar, and attempts to answer them.
I do agree that reading a lot of policy papers is of limited direct use in terms of direct hypothesis forming: in my experience the more valuable parts often have the form of private thoughts or semi-privately shared thinking.
On the other hand… in my view, if people have a decent epistemic base, they often should engage with the stuff you dislike, but from a proper perspective: not “this is the author attempting to literally communicate what they believe”, but more of “this is a written speech-act which probably makes some sense and has some purpose”. In other words… people who want to work on strategy unfortunately eventually need to be able to operate in epistemically hostile environments. They should train elsewhere, and spent enough time elsewhere to stay sane, but they need to understand e.g. how incentive landscapes influence what people think and write, and this is not possible to get good without dipping your feet in the water.
I’m sympathetic under some interpretations of “a ton of time,” but I think it’s still worth people’s time to spend at least ~10 hours of reading and ~10 hours of conversation getting caught up with AI governance/strategy thinking, if they want to contribute.
Arguments for this:
Some basic ideas/knowledge that the field is familiar with (e.g. on the semiconductor supply chain, antitrust law, immigration, US-China relations, how relevant governments and AI labs work, the history of international cooperation in the 20th century) seem really helpful for thinking about this stuff productively.
First-hand knowledge of how relevant governments and labs work is hard/costly to get on one’s own.
Lack of shared context makes collaboration with other researchers and funders more costly.
Even if the field doesn’t know that much and lots of papers are more advocacy pieces, people can learn from what the field does know and read the better content.
Yeah, totally, 10 hours of reading seems definitely worth it, and like, I think many hours of conversation, if only because those hours of conversation will probably just help you think through things yourself.
I also think it does make a decent amount of sense to coordinate with existing players in the field before launching new initiatives and doing big things, though I don’t think it should be a barrier before you suggest potential plans, or discuss potential directions forward.
If someone’s been following along with popular LW posts on alignment and is new to governance, I’d expect them to find the “core readings” in “weeks” 4-6 most relevant.
I agree that “buying time” isn’t a very useful category. Some thoughts on the things which seem to fall under the “buying time” category:
Evaluations
I think people should mostly consider this as a subcategory of technical alignment work, in particular the work of understanding models. The best evaluations will include work that’s pretty continuous with ML research more generally, like fine-tuning on novel tasks, developing new prompting techniques, and application of interpretability techniques.
Governance work, some subcategories of which include:
Lab coordination: work on this should mainly be done in close consultation with people already working at big AI labs, in order to understand the relevant constraints and opportunities
Various strands of technical work which is useful for the above
Outreach
One way to contribute to outreach is doing logistics for outreach programs (like the AGI safety fundamentals course)
Another way is to directly engage with ML researchers
Both of these seem very different from “buying time”—or at least “outreach to persuade people to become alignment researchers” doesn’t seem very different from “outreach to buy time somehow”
Thomas Kuhn argues in The Structure of Scientific Revolutions that research fields that focus directly on real-world fields usually make little progress. In his view, a field like physics where researchers in the 20th century focused on questions that are very theoretical produced a lot of progress.
When physicists work to advance the field of physics they look for possible experiments that can be done to increase the body of physics knowledge. A good portion of those doesn’t have any apparent real-world impact but they lead to accumulated knowledge in the field.
In contrast, a field like nutrition research tries to focus on questions that are directly relevant to how people eat. As a result, they don’t focus on fundamental research that leads to accumulated knowledge.
Given that background, I think that we want a good portion of alignment researchers not laser-focussed on specific theories of change.
I think that being driven by curiosity is different than being driven by a theory of change. Physicists are curious about figuring out any gaps in their models.
Feynman is also a good example. He hit a roadblock and couldn’t really think about how to work on any important problem so he decided to play around with math about physical problems.
If someone is curious to solve some problem in the broad sphere of alignment, it might be bad has to think about whether or not that actually helps with specific theories of change.
On the funding level, the theory of change might apply here, where you give the researchers who is able to solve some unsolved problem money because it progresses the field even if you have no idea whether the problem is useful to be solved. On the researcher level, it’s more being driven by curiosity.
At the same time, it’s also possible for the curiosity-driven approach to lead a field astray. Virology would be a good example. The virologists are curious about problems they can approach with the molecular biological toolkit but aren’t very curious about the problems you need to tackle to understand airborne transmission.
The virologists also do dangerous gain-of-function experiments that are driven by the curiosity to understand things about viruses but that are separate from a theory of change.
When approaching them a good stance might be: “If you do dangerous experiments then you need a theory of change” but for the non-dangerous experiments that are driven by curiosity to understand viruses without a theory of change there should be decent funding. You also want that there are some people who actually think about theory of change and go “If we want to prevent a pandemic, understanding how virus transmission works is important, so we should probably fund experiments for that even if it’s not the experiments toward which our curiosity drives us”.
Going back to the alignment question: You want to fund people who are curious about solving problems in the field to the extent that solving those problems isn’t likely to result in huge capability increases they should have decent funding. When it comes to working on problems where that are about capability increases you should require the researchers to have a clear theory of change. You also want some people to focus on a theory of change and seek problems based on a theory of change.
Thinking too hard about whether what you are doing actually is helping can lead to getting done nothing at all as Feymann experienced before we switched his approach.
It’s possible that at the present state of the AI safety field there are no clear roads to victory. If that’s the case then you would want to grow the AI safety field by letting researchers grow it more via curiosity-driven research.
The number of available researchers doesn’t tell you what’s actually required to make progress.
Here is a sceptical take: anyone who is prone to getting convinced by this post to switch to attempts at “buying time” interventions from attempts at do technical AI safety is pretty likely not a good fit to try any high-powered buying-time interventions.
The whole thing reads a bit like “AI governance” and “AI strategy” reinvented under a different name, seemingly without bothering to understand what’s the current understanding.
Figuring out that AI strategy and governance are maybe important, in late 2022, after spending substantial time on AI safety, does not seem to be what I would expect from promising AI strategists. Apparent lack of coordination with people already working in this area does not seem like a promising sign from people who would like to engage with hard coordination problems.
Also, I’m worried about suggestions like
Concretely, we think that roughly 80% of alignment researchers are working on directly solving alignment. We think that roughly 50% should be working on alignment, while 50% should be reallocated toward buying time.
We also think that roughly 90% of (competent) community-builders are focused on “typical community-building” (designed to get more alignment researchers). We think that roughly 70% should do typical community-building, and 30% should be buying time.
...could be easily counterproductive.
What is and would be really valuable are people who understand both the so-called “technical problem” and the so-called “strategy problem”. (secretly, they have more in common than people think)
What is not only not valuable, but easily harmful, would be an influx of people who understand neither, but engage with the strategy domain instead of technical domain.
I overall agree with this comment, but do want to push back on this sentence. I don’t really know what it means to “invent AI governance” or “invent AI strategy”, so I don’t really know what it means to “reinvent AI governance” or “reinvent AI strategy”.
Separately, I also don’t really think it’s worth spending a ton of time trying to really understand what current people think about AI governance. Like, I think we are mostly confused, it’s a very green-field situation, and it really doesn’t seem to me to be the case that you have to read the existing stuff to helpfully contribute. Also a really large fraction of the existing stuff is actually just political advocacy dressed up as inquiry, and I think many people are better off not reading it (like, the number of times I was confused about a point of some AI governance paper and the explanation turned out to be “yeah, the author didn’t really believe this, but saying it would aim their political goals of being taken more seriously, or gaining reputation, or allowing them to later change policy in some different way” is substantially larger than the number of times I learned something helpful from these papers, so I do mostly recommend staying away from them).
By reinventing it, I means, for example, asking questions like “how to influence the dynamic between AI labs in a way which allows everyone to slow down at critical stage”, “can we convince some actors about AI risk without the main effect being they will put more resources into the race”, “what’s up with China”, “how to generally slow down things when necessary” and similar, and attempts to answer them.
I do agree that reading a lot of policy papers is of limited direct use in terms of direct hypothesis forming: in my experience the more valuable parts often have the form of private thoughts or semi-privately shared thinking.
On the other hand… in my view, if people have a decent epistemic base, they often should engage with the stuff you dislike, but from a proper perspective: not “this is the author attempting to literally communicate what they believe”, but more of “this is a written speech-act which probably makes some sense and has some purpose”. In other words… people who want to work on strategy unfortunately eventually need to be able to operate in epistemically hostile environments. They should train elsewhere, and spent enough time elsewhere to stay sane, but they need to understand e.g. how incentive landscapes influence what people think and write, and this is not possible to get good without dipping your feet in the water.
I’m sympathetic under some interpretations of “a ton of time,” but I think it’s still worth people’s time to spend at least ~10 hours of reading and ~10 hours of conversation getting caught up with AI governance/strategy thinking, if they want to contribute.
Arguments for this:
Some basic ideas/knowledge that the field is familiar with (e.g. on the semiconductor supply chain, antitrust law, immigration, US-China relations, how relevant governments and AI labs work, the history of international cooperation in the 20th century) seem really helpful for thinking about this stuff productively.
First-hand knowledge of how relevant governments and labs work is hard/costly to get on one’s own.
Lack of shared context makes collaboration with other researchers and funders more costly.
Even if the field doesn’t know that much and lots of papers are more advocacy pieces, people can learn from what the field does know and read the better content.
Yeah, totally, 10 hours of reading seems definitely worth it, and like, I think many hours of conversation, if only because those hours of conversation will probably just help you think through things yourself.
I also think it does make a decent amount of sense to coordinate with existing players in the field before launching new initiatives and doing big things, though I don’t think it should be a barrier before you suggest potential plans, or discuss potential directions forward.
Do you have links to stuff you think would be worthwhile for newcomers to read?
Yep! Here’s a compilation.
If someone’s been following along with popular LW posts on alignment and is new to governance, I’d expect them to find the “core readings” in “weeks” 4-6 most relevant.
Thanks!
I agree that “buying time” isn’t a very useful category. Some thoughts on the things which seem to fall under the “buying time” category:
Evaluations
I think people should mostly consider this as a subcategory of technical alignment work, in particular the work of understanding models. The best evaluations will include work that’s pretty continuous with ML research more generally, like fine-tuning on novel tasks, developing new prompting techniques, and application of interpretability techniques.
Governance work, some subcategories of which include:
Lab coordination: work on this should mainly be done in close consultation with people already working at big AI labs, in order to understand the relevant constraints and opportunities
Policy work: see standard resources on this
Various strands of technical work which is useful for the above
Outreach
One way to contribute to outreach is doing logistics for outreach programs (like the AGI safety fundamentals course)
Another way is to directly engage with ML researchers
Both of these seem very different from “buying time”—or at least “outreach to persuade people to become alignment researchers” doesn’t seem very different from “outreach to buy time somehow”
Thomas Kuhn argues in The Structure of Scientific Revolutions that research fields that focus directly on real-world fields usually make little progress. In his view, a field like physics where researchers in the 20th century focused on questions that are very theoretical produced a lot of progress.
When physicists work to advance the field of physics they look for possible experiments that can be done to increase the body of physics knowledge. A good portion of those doesn’t have any apparent real-world impact but they lead to accumulated knowledge in the field.
In contrast, a field like nutrition research tries to focus on questions that are directly relevant to how people eat. As a result, they don’t focus on fundamental research that leads to accumulated knowledge.
Given that background, I think that we want a good portion of alignment researchers not laser-focussed on specific theories of change.
I think that being driven by curiosity is different than being driven by a theory of change. Physicists are curious about figuring out any gaps in their models.
Feynman is also a good example. He hit a roadblock and couldn’t really think about how to work on any important problem so he decided to play around with math about physical problems.
If someone is curious to solve some problem in the broad sphere of alignment, it might be bad has to think about whether or not that actually helps with specific theories of change.
On the funding level, the theory of change might apply here, where you give the researchers who is able to solve some unsolved problem money because it progresses the field even if you have no idea whether the problem is useful to be solved. On the researcher level, it’s more being driven by curiosity.
At the same time, it’s also possible for the curiosity-driven approach to lead a field astray. Virology would be a good example. The virologists are curious about problems they can approach with the molecular biological toolkit but aren’t very curious about the problems you need to tackle to understand airborne transmission.
The virologists also do dangerous gain-of-function experiments that are driven by the curiosity to understand things about viruses but that are separate from a theory of change.
When approaching them a good stance might be: “If you do dangerous experiments then you need a theory of change” but for the non-dangerous experiments that are driven by curiosity to understand viruses without a theory of change there should be decent funding. You also want that there are some people who actually think about theory of change and go “If we want to prevent a pandemic, understanding how virus transmission works is important, so we should probably fund experiments for that even if it’s not the experiments toward which our curiosity drives us”.
Going back to the alignment question: You want to fund people who are curious about solving problems in the field to the extent that solving those problems isn’t likely to result in huge capability increases they should have decent funding. When it comes to working on problems where that are about capability increases you should require the researchers to have a clear theory of change. You also want some people to focus on a theory of change and seek problems based on a theory of change.
Thinking too hard about whether what you are doing actually is helping can lead to getting done nothing at all as Feymann experienced before we switched his approach.
It’s possible that at the present state of the AI safety field there are no clear roads to victory. If that’s the case then you would want to grow the AI safety field by letting researchers grow it more via curiosity-driven research.
The number of available researchers doesn’t tell you what’s actually required to make progress.