I suspect the difference is mostly in what training opportunities are available, not what type of system is used internally.
In principle, a strong NLP AI might learn some behaviour that manipulates humans. It’s just that in practice it is more difficult for it to do so, because in almost all of the training phase there is no interaction at all. The input is decoupled from its output, so there is no training signal to improve any ability to manipulate the input.
In reality there are some side-channels that are interactive, such as selection of fine-tuning training based on human evaluation. A sufficiently powerful system might be able to learn enough from that to manipulate the world, but it seems much less likely than some other type of system with more interactive learning doing it first.
I suspect the difference is mostly in what training opportunities are available, not what type of system is used internally.
In principle, a strong NLP AI might learn some behaviour that manipulates humans. It’s just that in practice it is more difficult for it to do so, because in almost all of the training phase there is no interaction at all. The input is decoupled from its output, so there is no training signal to improve any ability to manipulate the input.
In reality there are some side-channels that are interactive, such as selection of fine-tuning training based on human evaluation. A sufficiently powerful system might be able to learn enough from that to manipulate the world, but it seems much less likely than some other type of system with more interactive learning doing it first.