The most common, these days, is some variant of “train an AI to help with aligning AI”. Sometimes it’s “train an AI to interpret the internals of another AI”, sometimes it’s “train an AI to point out problems in another AI’s plan”, sometimes it’s “train an AI to help you design aligned AI”, etc. I would guess about 75% of newcomers from ML suggest some such variant as their first idea.
I don’t think these are crazy or bad ideas at all—I’d be happy to steelman them with you at some point if you want. Certainly, we don’t know how to make any of them work right now, but I think they are all reasonable directions to go down if one wants to work on the various open problems related to them. The problem—and this is what I would say to somebody if they came to me with these ideas—is that they’re not so much “ideas for how to solve alignment” so much as “entire research topics unto themselves.”
I don’t think these are crazy or bad ideas at all—I’d be happy to steelman them with you at some point if you want. Certainly, we don’t know how to make any of them work right now, but I think they are all reasonable directions to go down if one wants to work on the various open problems related to them. The problem—and this is what I would say to somebody if they came to me with these ideas—is that they’re not so much “ideas for how to solve alignment” so much as “entire research topics unto themselves.”