I’m not aware of any airtight argument that “pure” self-supervised learning systems, either generically or with any particular architecture, are safe to use, to arbitrary levels of intelligence, though it seems very much worth someone trying to prove or disprove that. For my part, I got distracted by other things and haven’t thought about it much since then.
The other issue is whether “pure” self-supervised learning systems would be capable enough to satisfy our AGI needs, or to safely bootstrap to systems that are. I go back and forth on this. One side of the argument I wrote up here. The other side is, I’m now (vaguely) thinking that people need a reward system to decide what thoughts to think, and the fact that GPT-3 doesn’t need reward is not evidence of reward being unimportant but rather evidence that GPT-3 is nothing like an AGI. Well, maybe.
For humans, self-supervised learning forms the latent representations, but the reward system controls action selection. It’s not altogether unreasonable to think that action selection, and hence reward, is a more important thing to focus on for safety research. AGIs are dangerous when they take dangerous actions, to a first approximation. The fact that a larger fraction of neocortical synapses are adjusted by self-supervised learning than by reward learning is interesting and presumably safety-relevant, but I don’t think it immediately proves that self-supervised learning has a similarly larger fraction of the answers to AGI safety questions. Maybe, maybe not, it’s not immediately obvious. :-)
I wrote a few posts on self-supervised learning last year:
https://www.lesswrong.com/posts/SaLc9Dv5ZqD73L3nE/the-self-unaware-ai-oracle
https://www.lesswrong.com/posts/EMZeJ7vpfeF4GrWwm/self-supervised-learning-and-agi-safety
https://www.lesswrong.com/posts/L3Ryxszc3X2J7WRwt/self-supervised-learning-and-manipulative-predictions
I’m not aware of any airtight argument that “pure” self-supervised learning systems, either generically or with any particular architecture, are safe to use, to arbitrary levels of intelligence, though it seems very much worth someone trying to prove or disprove that. For my part, I got distracted by other things and haven’t thought about it much since then.
The other issue is whether “pure” self-supervised learning systems would be capable enough to satisfy our AGI needs, or to safely bootstrap to systems that are. I go back and forth on this. One side of the argument I wrote up here. The other side is, I’m now (vaguely) thinking that people need a reward system to decide what thoughts to think, and the fact that GPT-3 doesn’t need reward is not evidence of reward being unimportant but rather evidence that GPT-3 is nothing like an AGI. Well, maybe.
For humans, self-supervised learning forms the latent representations, but the reward system controls action selection. It’s not altogether unreasonable to think that action selection, and hence reward, is a more important thing to focus on for safety research. AGIs are dangerous when they take dangerous actions, to a first approximation. The fact that a larger fraction of neocortical synapses are adjusted by self-supervised learning than by reward learning is interesting and presumably safety-relevant, but I don’t think it immediately proves that self-supervised learning has a similarly larger fraction of the answers to AGI safety questions. Maybe, maybe not, it’s not immediately obvious. :-)