The amount of effort going into AI as a whole ($10s of billions per year) is currently ~2 orders of magnitude larger than the amount of effort going into the kind of empirical alignment I’m proposing here, and at least in the short-term (given excitement about scaling), I expect it to grow faster than investment into the alignment work.
There’s a reasonable argument (shoutout to Justin Shovelain) that the risk is that work such as this done by AI alignment people will be closer to AGI than the work done by standard commercial or academic research, and therefore accelerate AGI more than average AI research would. Thus, $10s of billions per year into general AI is not quite the right comparison, because little of that money goes to matters “close to AGI”.
That said, on balance, I’m personally in favor of the work this post outlines.
I’m personally skeptical that this work is better-optimized for improving AI capabilities than other work being done in industry. In general, I’m skeptical of perspectives that work that the rationalist/EA/alignment crowd does Pareto-dominates the other work going on—that is, that it’s significantly better for both alignment and capabilities than standard work, such that others are simply making a mistake by not working on it regardless of what their goals are or how much they care about alignment. I think sometimes this could be the case, but I wouldn’t bet on it being a large effect. In general, I expect work optimized to help with alignment to be worse on average at pushing forward capabilities, and vice versa.
There’s a reasonable argument (shoutout to Justin Shovelain) that the risk is that work such as this done by AI alignment people will be closer to AGI than the work done by standard commercial or academic research, and therefore accelerate AGI more than average AI research would. Thus, $10s of billions per year into general AI is not quite the right comparison, because little of that money goes to matters “close to AGI”.
That said, on balance, I’m personally in favor of the work this post outlines.
I’m personally skeptical that this work is better-optimized for improving AI capabilities than other work being done in industry. In general, I’m skeptical of perspectives that work that the rationalist/EA/alignment crowd does Pareto-dominates the other work going on—that is, that it’s significantly better for both alignment and capabilities than standard work, such that others are simply making a mistake by not working on it regardless of what their goals are or how much they care about alignment. I think sometimes this could be the case, but I wouldn’t bet on it being a large effect. In general, I expect work optimized to help with alignment to be worse on average at pushing forward capabilities, and vice versa.