His answer to “But won’t an AI research assistant speed up AI progress more than alignment progress?” seems to be “yes it might, but that’s going to happen anyway so it’s fine”, without addressing what makes this fine at all. Sure, if we already have AI research assistants that are greatly pushing forward AI progress, we might as well try to use them for alignment. I don’t disagree there, but this is a strange response to the concern that the very tool OpenAI plans to use for alignment may hurt us more than help us.
I think there’s a pessimistic reading of the situation which explains some of this evidence well. I don’t literally believe it but I think it may be useful. It goes as follows:
The internal culture at OpenAI regarding alignment is bleak: most people are not very concerned with alignment and are pretty excited about making their powerful AI systems. This train is already moving with a lot of momentum. The alignment team doesn’t have very much sway about the speed of the train or the overall direction. However, there is hope in the following place: you can try to develop systems that allow OpenAI to easily produce more alignment research down the line by figuring out how to automate alignment research. Then, if company culture shifts (e.g., because of warning shots), the option now exists to automate lots of alignment work quickly.
To be clear, I think this is a bad plan, but it might be relatively good given the situation, if the situation is so bleak. And if you think alignment isn’t that hard, then it’s plausibly a fine plan. If you’re a random alignment researcher bringing your skills to bear on the alignment problem, this is far from the first plan I would go with, but if you are one of the few people with a seat on the OpenAI train, this might be among your best shots at success (though I think people in such a position should be quite pessimistic and openly so).
I think this view does a good job explaining the following quote from Jan:
Once we reach a significant degree of automation, we can much more easily reallocate GPUs between alignment and capability research. In particular, whenever our alignment techniques are inadequate, we can spend more compute on improving them. Additional resources are much easier to request than requesting that other people stop doing something they are excited about but that our alignment techniques are inadequate for.
Yeah that last quote is pretty worrying. If the alignment team doesn’t have the political capital / support of leadership within the org to have people stop doing particular projects or development pathways, I am even more pessimistic about OpenAI’s trajectory. I hope that changes!
Good post!
I think there’s a pessimistic reading of the situation which explains some of this evidence well. I don’t literally believe it but I think it may be useful. It goes as follows:
The internal culture at OpenAI regarding alignment is bleak: most people are not very concerned with alignment and are pretty excited about making their powerful AI systems. This train is already moving with a lot of momentum. The alignment team doesn’t have very much sway about the speed of the train or the overall direction. However, there is hope in the following place: you can try to develop systems that allow OpenAI to easily produce more alignment research down the line by figuring out how to automate alignment research. Then, if company culture shifts (e.g., because of warning shots), the option now exists to automate lots of alignment work quickly.
To be clear, I think this is a bad plan, but it might be relatively good given the situation, if the situation is so bleak. And if you think alignment isn’t that hard, then it’s plausibly a fine plan. If you’re a random alignment researcher bringing your skills to bear on the alignment problem, this is far from the first plan I would go with, but if you are one of the few people with a seat on the OpenAI train, this might be among your best shots at success (though I think people in such a position should be quite pessimistic and openly so).
I think this view does a good job explaining the following quote from Jan:
Yeah that last quote is pretty worrying. If the alignment team doesn’t have the political capital / support of leadership within the org to have people stop doing particular projects or development pathways, I am even more pessimistic about OpenAI’s trajectory. I hope that changes!