So you want to align the AI with us rather than its user by choosing the alignment approach it uses. If it’s corrigible towards its user, won’t it acquire the capabilities of the other approach in short order to better serve its user? Or is retrofitting the other approach also a blind spot of your proposed approach?
If it’s corrigible towards its user, won’t it acquire the capabilities of the other approach in short order to better serve its user?
Yes, that seems like an issue.
Or is retrofitting the other approach also a blind spot of your proposed approach?
That’s one possible solution. Another one might be to create an aligned AI that is especially good at coordinating with other AIs, so that these AIs can make an agreement with each other to not develop nuclear weapons before they invent the AI that is especially good at developing nuclear weapons. (But would corrigibility imply that the user can always override such agreements?) There may be other solutions that I’m not thinking of. If all else fails, it may be that the only way to avoid AI-caused differential intellectual progress in a bad direction is to stop the development of AI.
So you want to align the AI with us rather than its user by choosing the alignment approach it uses. If it’s corrigible towards its user, won’t it acquire the capabilities of the other approach in short order to better serve its user? Or is retrofitting the other approach also a blind spot of your proposed approach?
Yes, that seems like an issue.
That’s one possible solution. Another one might be to create an aligned AI that is especially good at coordinating with other AIs, so that these AIs can make an agreement with each other to not develop nuclear weapons before they invent the AI that is especially good at developing nuclear weapons. (But would corrigibility imply that the user can always override such agreements?) There may be other solutions that I’m not thinking of. If all else fails, it may be that the only way to avoid AI-caused differential intellectual progress in a bad direction is to stop the development of AI.