These original warnings were always written from a framework that assumed the only way to make intelligence is RL. They are still valid for RL, but thankfully it seems that at least for the time being, pure RL is not popular; I imagine that might have something to do with how obvious it is to everyone who tries pure RL that it’s pretty hard to get it to do useful things, for reasons that can be reasonably called alignment problems.
Imagine trying to get an AI to cure cancer entirely by RLHF, without even letting it learn language first. That’s how bad they thought it would be.
But RL setups do get used, and they do have generalization issues that do have connection to these issues.
These original warnings were always written from a framework that assumed the only way to make intelligence is RL. They are still valid for RL, but thankfully it seems that at least for the time being, pure RL is not popular; I imagine that might have something to do with how obvious it is to everyone who tries pure RL that it’s pretty hard to get it to do useful things, for reasons that can be reasonably called alignment problems.
Imagine trying to get an AI to cure cancer entirely by RLHF, without even letting it learn language first. That’s how bad they thought it would be.
But RL setups do get used, and they do have generalization issues that do have connection to these issues.