I think that future AI technology could automate my job. I think it could also automate capability researchers’ jobs. (It could also help in lots of other ways, but this point seems sufficient to highlight the difference between our views.)
I don’t think that being more useful for alignment is a necessary claim for my position. We are talking about what we want our aligned AIs to do for us, and hence what we should have in mind while doing AI alignment research. If we think AI accelerates technological progress across the board, then the answer “we want our AI to keep accelerating good stuff happening in the world at the same rate that it accelerates dangerous technology” seems like it’s valid.
And it will be ok to have unaligned capabilities, because government will stop them, maybe using existing aligned AI technology, and it will do it in the future but not now because future AI technology will be better in demonstrating risk? Why do you think that default response of humanity to increasing offense-defense balance and vulnerability to terrorism will be correct? Why, for example, capability detection can’t be insufficient at the time when multiple actors arrive at world-destroying capabilities for regulators to stop them?
I think that future AI technology could automate my job. I think it could also automate capability researchers’ jobs. (It could also help in lots of other ways, but this point seems sufficient to highlight the difference between our views.)
I don’t think that being more useful for alignment is a necessary claim for my position. We are talking about what we want our aligned AIs to do for us, and hence what we should have in mind while doing AI alignment research. If we think AI accelerates technological progress across the board, then the answer “we want our AI to keep accelerating good stuff happening in the world at the same rate that it accelerates dangerous technology” seems like it’s valid.
And it will be ok to have unaligned capabilities, because government will stop them, maybe using existing aligned AI technology, and it will do it in the future but not now because future AI technology will be better in demonstrating risk? Why do you think that default response of humanity to increasing offense-defense balance and vulnerability to terrorism will be correct? Why, for example, capability detection can’t be insufficient at the time when multiple actors arrive at world-destroying capabilities for regulators to stop them?