I feel like a lot of these arguments could be pretty easily made of individual AI safety researchers. E.g.
Misaligned Incentives
In much the same way that AI systems may have perverse incentives, so do the [AI safety researchers]. They are [humans]. They need to make money, [feed themselves, and attract partners]. [Redacted and redacted even just got married.] This type of accountability to [personal] interests is not perfectly in line with doing what is good for human interests. Moreover, [AI safety researchers are often] technocrats whose values and demographics do not represent humanity particularly well. Optimizing for the goals that the [AI safety researchers] have is not the same thing as optimizing for human welfare. Goodhart’s Law applies.
I feel pretty similarly about most of the other arguments in this post.
Tbc I think there are plenty of things one could reasonably critique scaling labs about, I just think the argumentation in this post is by and large off the mark, and implies a standard that if actually taken literally would be a similarly damning critique of the alignment community.
(Conflict of interest notice: I work at Google DeepMind.)
Thanks. I agree that the points apply to individual researchers. But I don’t think that it applies in a comparably worrisome way because individual researchers do not have comparable intelligence, money, and power compared to the labs. This is me stressing the “when put under great optimization pressure” of Goodhart’s Law. Subtle misalignments are much less dangerous when there is a weak optimization force behind the proxy than when there is a strong one.
It makes a lot of sense that misaligned organizations are more dangerous than misaligned individuals because of power differences. And at the same time some individuals are pretty powerful and we should be concerned about their actions too, just like labs’. Pinboard made this argument about Sam Altman back in 2016 already https://x.com/Pinboard/status/1792945916241916036/photo/1 and I guess it only got more relevant since.
I feel like a lot of these arguments could be pretty easily made of individual AI safety researchers. E.g.
I feel pretty similarly about most of the other arguments in this post.
Tbc I think there are plenty of things one could reasonably critique scaling labs about, I just think the argumentation in this post is by and large off the mark, and implies a standard that if actually taken literally would be a similarly damning critique of the alignment community.
(Conflict of interest notice: I work at Google DeepMind.)
Thanks. I agree that the points apply to individual researchers. But I don’t think that it applies in a comparably worrisome way because individual researchers do not have comparable intelligence, money, and power compared to the labs. This is me stressing the “when put under great optimization pressure” of Goodhart’s Law. Subtle misalignments are much less dangerous when there is a weak optimization force behind the proxy than when there is a strong one.
It makes a lot of sense that misaligned organizations are more dangerous than misaligned individuals because of power differences. And at the same time some individuals are pretty powerful and we should be concerned about their actions too, just like labs’. Pinboard made this argument about Sam Altman back in 2016 already https://x.com/Pinboard/status/1792945916241916036/photo/1 and I guess it only got more relevant since.