Yeah, I don’t find “we can’t verify good alignment research” nearly as persuasive as other people around here:
Verification does seem way easier, even for alignment research. This is probably the most interesting and perplexing disagreement.
Even if verification isn’t easier than generation, you can still just do what a human would do faster. That seems like a big deal, and quite a lot of what early AI systems will be doing. Focusing only on generation vs verification seems like it’s radically understating the case.
AI systems can also help with verification, e.g. noticing problems in possible ideas, generating experimental setups in which to evaluate ideas, and so forth. These tasks don’t seem especially hard to verify either.
You could imagine training an ML system end-to-end on “make the next ML system smarter.” But that’s not how things appear to be going (and it’s just really hard to do with gradient decent). Instead it looks likely that ML systems will probably be doing things more like “solve subtasks identified by other humans or AIs” (just like most humans who work on these things). In this regime, having a reward function for the end result isn’t that important.
To the extent we can’t recognize good alignment research when we see it, I think that also makes humans’ alignment research less efficient, and so the comparative advantage question is less obvious than the absolute difficulty question.
I think that “we can’t verify good alignment research” is probably a smaller consideration than “alignment is more labor intensive while capabilities research is more capital intensive.” Neither is decisive, and I expect other factors will mostly dominate (like changes in allocation of labor).
This isn’t to say I think it’s easy to get AI systems to solve alignment for you, such that it doesn’t matter if you work on it in advance. But I’m not yet persuaded at all by “AI systems will be crazy superhuman before they make big contributions in alignment,” and don’t think that the LW community should particularly expect other folks to be persuaded either.
Yeah, I don’t find “we can’t verify good alignment research” nearly as persuasive as other people around here:
Verification does seem way easier, even for alignment research. This is probably the most interesting and perplexing disagreement.
Even if verification isn’t easier than generation, you can still just do what a human would do faster. That seems like a big deal, and quite a lot of what early AI systems will be doing. Focusing only on generation vs verification seems like it’s radically understating the case.
AI systems can also help with verification, e.g. noticing problems in possible ideas, generating experimental setups in which to evaluate ideas, and so forth. These tasks don’t seem especially hard to verify either.
You could imagine training an ML system end-to-end on “make the next ML system smarter.” But that’s not how things appear to be going (and it’s just really hard to do with gradient decent). Instead it looks likely that ML systems will probably be doing things more like “solve subtasks identified by other humans or AIs” (just like most humans who work on these things). In this regime, having a reward function for the end result isn’t that important.
To the extent we can’t recognize good alignment research when we see it, I think that also makes humans’ alignment research less efficient, and so the comparative advantage question is less obvious than the absolute difficulty question.
I think that “we can’t verify good alignment research” is probably a smaller consideration than “alignment is more labor intensive while capabilities research is more capital intensive.” Neither is decisive, and I expect other factors will mostly dominate (like changes in allocation of labor).
This isn’t to say I think it’s easy to get AI systems to solve alignment for you, such that it doesn’t matter if you work on it in advance. But I’m not yet persuaded at all by “AI systems will be crazy superhuman before they make big contributions in alignment,” and don’t think that the LW community should particularly expect other folks to be persuaded either.