What do you think of a claim like “most of the intelligence comes from the steps where you do most of the optimization”? A corollary of this is that we particularly want to make sure optimization intensive steps of AI creation are safe WRT not producing intelligent programs devoted to killing us.
This seems probably right to me.
Example: most of the “intelligence” of language models comes from the supervised learning step. However, it’s in-principle plausible that we could design e.g. some really capable general purpose reinforcement learner where the intelligence comes from the reinforcement, and the latter could (but wouldn’t necessarily) internalise “agenty” behaviour.
I agree that reinforcement learners seem more likely to be agent-y (and therefore scarier) than self-supervised learners.
This seems probably right to me.
I agree that reinforcement learners seem more likely to be agent-y (and therefore scarier) than self-supervised learners.