In some sense it already happens. When we train AI on more and more human-generated texts, it, in some sense, gets more capabilities and more alignment.
Yes it does become easier to control and communicate with, but it does not become harder to make it be malicious. I’m not sure that an AI scheme that can’t be trivially turned evil rerverso is possible, but I would like to try to find one.
In some sense it already happens. When we train AI on more and more human-generated texts, it, in some sense, gets more capabilities and more alignment.
Yes it does become easier to control and communicate with, but it does not become harder to make it be malicious. I’m not sure that an AI scheme that can’t be trivially turned evil rerverso is possible, but I would like to try to find one.