It looks like it’s totally plausible for many kinds of limited systems to greatly accelerate R&D
Do you have any concrete example from any current alignment work where it would be helpful to have some future AI technology? Or else any other way in which such technologies will be useful for alignment? Would it be something like “we trained it listen to humans, it wasn’t perfect, but we looked at its utility function with transparency tools and that gave us ideas”? Oh, and it should be more useful for alignment than it is useful for creating something that can defeat safety measures at the moment, right? Because I don’t get how, for example, better hardware or coding assistance or money wouldn’t just result in faster development of something misaligned. And so I don’t get how everyone competing on developing AI is helping matters—wouldn’t existence of half as capable AI before the one that ended the world just made world-ending AI to appear earlier? Like, it could dramatically change things, but why would it change things for the better if no one planned it?
I think that future AI technology could automate my job. I think it could also automate capability researchers’ jobs. (It could also help in lots of other ways, but this point seems sufficient to highlight the difference between our views.)
I don’t think that being more useful for alignment is a necessary claim for my position. We are talking about what we want our aligned AIs to do for us, and hence what we should have in mind while doing AI alignment research. If we think AI accelerates technological progress across the board, then the answer “we want our AI to keep accelerating good stuff happening in the world at the same rate that it accelerates dangerous technology” seems like it’s valid.
And it will be ok to have unaligned capabilities, because government will stop them, maybe using existing aligned AI technology, and it will do it in the future but not now because future AI technology will be better in demonstrating risk? Why do you think that default response of humanity to increasing offense-defense balance and vulnerability to terrorism will be correct? Why, for example, capability detection can’t be insufficient at the time when multiple actors arrive at world-destroying capabilities for regulators to stop them?
Do you have any concrete example from any current alignment work where it would be helpful to have some future AI technology? Or else any other way in which such technologies will be useful for alignment? Would it be something like “we trained it listen to humans, it wasn’t perfect, but we looked at its utility function with transparency tools and that gave us ideas”? Oh, and it should be more useful for alignment than it is useful for creating something that can defeat safety measures at the moment, right? Because I don’t get how, for example, better hardware or coding assistance or money wouldn’t just result in faster development of something misaligned. And so I don’t get how everyone competing on developing AI is helping matters—wouldn’t existence of half as capable AI before the one that ended the world just made world-ending AI to appear earlier? Like, it could dramatically change things, but why would it change things for the better if no one planned it?
I think that future AI technology could automate my job. I think it could also automate capability researchers’ jobs. (It could also help in lots of other ways, but this point seems sufficient to highlight the difference between our views.)
I don’t think that being more useful for alignment is a necessary claim for my position. We are talking about what we want our aligned AIs to do for us, and hence what we should have in mind while doing AI alignment research. If we think AI accelerates technological progress across the board, then the answer “we want our AI to keep accelerating good stuff happening in the world at the same rate that it accelerates dangerous technology” seems like it’s valid.
And it will be ok to have unaligned capabilities, because government will stop them, maybe using existing aligned AI technology, and it will do it in the future but not now because future AI technology will be better in demonstrating risk? Why do you think that default response of humanity to increasing offense-defense balance and vulnerability to terrorism will be correct? Why, for example, capability detection can’t be insufficient at the time when multiple actors arrive at world-destroying capabilities for regulators to stop them?