What do you think happens in a world where there is $100 billion in yearly alignment funding? How would they be making less progress? I want to note that even horrifically inefficient systems still produce more output than “uncorrupted” hobbyists—cancer research would produce much fewer results if it were done by 300 perfectly coordinated people, even if the 300 had zero ethical/legal restraints.
Let’s take cancer as an analogy for a moment. Suppose that, as a baseline, cancer research is basically-similar to other areas of medical research. Then, some politician comes along and declares “war on cancer”, and blindly pumps money into cancer research specifically. What happens? Well...
So even just from eyeballing that chart, it’s pretty plausible to me that if cancer funding dropped by a factor of 10, the net effect would be that clinical trial pass rates just return to comparable levels to other areas, and the actual benefits of all that research remain roughly-the-same.
… but that’s ignoring second-order effects.
Technical research fields have a “median researcher” problem: the memetic success of work in the field is not determined by the best researchers, but by the median researchers. Even if e.g. the best psychologists understand enough statistics to recognize crap studies, the median psychologist doesn’t (or at least didn’t 10 years ago), so we ended up with a field full of highly-memetically-successful crap studies which did not replicate (think Carol Dweck).
Back to cancer: if the large majority of the field is doing work which is predictably useless, then the field will develop standards for “success” which are totally decoupled from actual usefulness. (Note that the chart above doesn’t actually imply that most of the work done in the field is useless, let alone predictably useless; there’s not a one-to-one map between cancer research projects and clinical trials.) To a large extent, the new standards would be directly opposed to actual usefulness, in order to defend the entrenched researchers doing crap work—think Carol Dweck arguing that replication is a bad standard for psychology.
That’s the sort of thing I expect would happen if a government dumped $100B into alignment funding. There’d be a flood of people with nominally-alignment-related projects which are in fact basically useless for solving alignment; they would quickly balloon to 90+% of the field. With such people completely dominating the field, first memetic success and then grant money would mostly be apportioned by people whose standards for success are completely decoupled from actual usefulness for alignment. Insofar as anything useful got done, it would mostly be by people who figured out the real challenges of alignment for themselves, and had to basically hack the funding system in order to get money for their actually-useful work.
In the case of cancer, steady progress has been made over the years despite the mess; at the end of the day, clinical trials provide a good ground-truth signal for progress on cancer. Even if lots of shit is thrown at the wall, some of it sticks, and that’s useful. In alignment, one of the main frames for why the problem is hard is that we do not have a good ground-truth signal for whether we’re making progress. So all these problems would be much worse than usual, and it’s less likely to be the actually-useful shit which sticks to the metaphorical wall.
the issues you mention don’t seem tied to public versus private funding but more about size of funding + an intrinsically difficul scientific question. I agree that at some point more funding doesn’t help. At the moment, that doesn’t seem to be the case in alignment. Indeed, alignment is not even as large in number of researchers as a relatively small field like linguistics.
How well the funders understand the field, and can differentially target more-useful projects, is a key variable here. For public funding, the top-level decision maker is a politician; they will in the vast majority of cases have approximately-zero understanding themselves. They will either apportion funding on purely political grounds (e.g. pork-barrel spending), or defer to whoever the consensus “experts” are in the field (which is where the median researcher problem kicks in).
In alignment to date, the funders have generally been people who understand the problem themselves to at least enough extent to notice that it’s worth paying attention to (in a world where alignment concern wasn’t already mainstream), and can therefore differentially target useful work, rather than blindly spray money around.
Seems overstated. Universities support all kinds of very specialized long-term research that politicians don’t understand.
From my own observations and from talking with funders themselves most funding decisions in AI safety are made on mostly superficial markers—grantmakers on the whole don’t dive deep on technical details. [In fact, I would argue that blindly spraying around money in a more egalitarian way (i.e. what SeriMATS has accomplished) is probably not much worse than the status-quo.]
Academia isn’t perfect but on the whole it gives a lot of bright people the time, space and financial flexibility to pursue their own judgement. In fact, many alignment researchers have done a significant part of work in an academic setting or being supported in some ways by public funding.
At first, I predicted you were going to say that public funding would accelerate capabilities research over alignment but it seems like the gist of your argument is that lots of public funding would muddy the water and sharply reduce the average quality of alignment research.
That might be true for theoretical AI alignment research but I’d imagine it’s less of a problem for types of AI alignment research that have decent feedback loops like interpretability research and other kinds of empirical research like experiments on RL agents.
One reason that I’m skeptical is that there doesn’t seem to be a similar problem in the field of ML which is huge and largely publicly funded to the best of my knowledge and still makes good progress. Possible reasons why the ML field is still effective despite its size include sufficient empirical feedback loops and the fact that top conferences reject most papers (~25% is a typical acceptance rate for papers at NeurIPS).
Yeah, to be clear, acceleration of capabilities is a major reason why I expect public funding would be net negative, rather than just much closer to zero impact than naive multiplication would suggest.
Ignoring the capabilities issue, I think there’s lots of room for uncertainty about whether a big injection of “blind funding” would be net positive, for the reasons explained above. I think we should be pretty confident that the results would be an OOM or more less positive than the naive multiplication suggests, but that’s still not the same as “net negative”; the net positivity/negativity I see as much more uncertain (ignoring capabilities impact).
Accounting for capabilities impact, I think the net impact would be pretty robustly negative.
That might be true for theoretical AI alignment research but I’d imagine it’s less of a problem for types of AI alignment research that have decent feedback loops like interpretability research and other kinds of empirical research like experiments on RL agents.
(Which is not to say that e.g. interpretability research isn’t useful—we can often get great feedback loops on things which provide a useful foundation for the hard parts later on. The point is that, if the field as a whole streetlights on things with good feedback loops, it will end up ignoring the most dangerous things.)
What do you think happens in a world where there is $100 billion in yearly alignment funding? How would they be making less progress? I want to note that even horrifically inefficient systems still produce more output than “uncorrupted” hobbyists—cancer research would produce much fewer results if it were done by 300 perfectly coordinated people, even if the 300 had zero ethical/legal restraints.
Let’s take cancer as an analogy for a moment. Suppose that, as a baseline, cancer research is basically-similar to other areas of medical research. Then, some politician comes along and declares “war on cancer”, and blindly pumps money into cancer research specifically. What happens? Well...
So even just from eyeballing that chart, it’s pretty plausible to me that if cancer funding dropped by a factor of 10, the net effect would be that clinical trial pass rates just return to comparable levels to other areas, and the actual benefits of all that research remain roughly-the-same.
… but that’s ignoring second-order effects.
Technical research fields have a “median researcher” problem: the memetic success of work in the field is not determined by the best researchers, but by the median researchers. Even if e.g. the best psychologists understand enough statistics to recognize crap studies, the median psychologist doesn’t (or at least didn’t 10 years ago), so we ended up with a field full of highly-memetically-successful crap studies which did not replicate (think Carol Dweck).
Back to cancer: if the large majority of the field is doing work which is predictably useless, then the field will develop standards for “success” which are totally decoupled from actual usefulness. (Note that the chart above doesn’t actually imply that most of the work done in the field is useless, let alone predictably useless; there’s not a one-to-one map between cancer research projects and clinical trials.) To a large extent, the new standards would be directly opposed to actual usefulness, in order to defend the entrenched researchers doing crap work—think Carol Dweck arguing that replication is a bad standard for psychology.
That’s the sort of thing I expect would happen if a government dumped $100B into alignment funding. There’d be a flood of people with nominally-alignment-related projects which are in fact basically useless for solving alignment; they would quickly balloon to 90+% of the field. With such people completely dominating the field, first memetic success and then grant money would mostly be apportioned by people whose standards for success are completely decoupled from actual usefulness for alignment. Insofar as anything useful got done, it would mostly be by people who figured out the real challenges of alignment for themselves, and had to basically hack the funding system in order to get money for their actually-useful work.
In the case of cancer, steady progress has been made over the years despite the mess; at the end of the day, clinical trials provide a good ground-truth signal for progress on cancer. Even if lots of shit is thrown at the wall, some of it sticks, and that’s useful. In alignment, one of the main frames for why the problem is hard is that we do not have a good ground-truth signal for whether we’re making progress. So all these problems would be much worse than usual, and it’s less likely to be the actually-useful shit which sticks to the metaphorical wall.
Many things here.
the issues you mention don’t seem tied to public versus private funding but more about size of funding + an intrinsically difficul scientific question. I agree that at some point more funding doesn’t help. At the moment, that doesn’t seem to be the case in alignment. Indeed, alignment is not even as large in number of researchers as a relatively small field like linguistics.
How well the funders understand the field, and can differentially target more-useful projects, is a key variable here. For public funding, the top-level decision maker is a politician; they will in the vast majority of cases have approximately-zero understanding themselves. They will either apportion funding on purely political grounds (e.g. pork-barrel spending), or defer to whoever the consensus “experts” are in the field (which is where the median researcher problem kicks in).
In alignment to date, the funders have generally been people who understand the problem themselves to at least enough extent to notice that it’s worth paying attention to (in a world where alignment concern wasn’t already mainstream), and can therefore differentially target useful work, rather than blindly spray money around.
Seems overstated. Universities support all kinds of very specialized long-term research that politicians don’t understand.
From my own observations and from talking with funders themselves most funding decisions in AI safety are made on mostly superficial markers—grantmakers on the whole don’t dive deep on technical details. [In fact, I would argue that blindly spraying around money in a more egalitarian way (i.e. what SeriMATS has accomplished) is probably not much worse than the status-quo.]
Academia isn’t perfect but on the whole it gives a lot of bright people the time, space and financial flexibility to pursue their own judgement. In fact, many alignment researchers have done a significant part of work in an academic setting or being supported in some ways by public funding.
At first, I predicted you were going to say that public funding would accelerate capabilities research over alignment but it seems like the gist of your argument is that lots of public funding would muddy the water and sharply reduce the average quality of alignment research.
That might be true for theoretical AI alignment research but I’d imagine it’s less of a problem for types of AI alignment research that have decent feedback loops like interpretability research and other kinds of empirical research like experiments on RL agents.
One reason that I’m skeptical is that there doesn’t seem to be a similar problem in the field of ML which is huge and largely publicly funded to the best of my knowledge and still makes good progress. Possible reasons why the ML field is still effective despite its size include sufficient empirical feedback loops and the fact that top conferences reject most papers (~25% is a typical acceptance rate for papers at NeurIPS).
Yeah, to be clear, acceleration of capabilities is a major reason why I expect public funding would be net negative, rather than just much closer to zero impact than naive multiplication would suggest.
Ignoring the capabilities issue, I think there’s lots of room for uncertainty about whether a big injection of “blind funding” would be net positive, for the reasons explained above. I think we should be pretty confident that the results would be an OOM or more less positive than the naive multiplication suggests, but that’s still not the same as “net negative”; the net positivity/negativity I see as much more uncertain (ignoring capabilities impact).
Accounting for capabilities impact, I think the net impact would be pretty robustly negative.
The parts where the bad feedback loops are, are exactly the places where the things-which-might-actually-kill-us are. Things we can see coming are exactly the things which don’t particularly need research to stop, and the fact that we can see them is exactly what makes the feedback loops good. It is not an accident that the feedback loop problem is unusually severe for the field of alignment in particular.
(Which is not to say that e.g. interpretability research isn’t useful—we can often get great feedback loops on things which provide a useful foundation for the hard parts later on. The point is that, if the field as a whole streetlights on things with good feedback loops, it will end up ignoring the most dangerous things.)