I agree that 1-3 need more attention, thanks for raising them.
Many AI scientists in the 1950s and 1960s incorrectly expected that cracking computer chess would automatically crack other tasks as well.
There’s a simple disconnect here between chess and self-supervised learning. You’re probably aware of it but it it’s worth mentioning. Chess algorithms were historically designed to win at chess. In contrast, the point of self-supervised learning is to extract representations that are useful in general. For example, to solve a new tasks we can feed the representations into a linear regression, another general algorithm. ML researchers have argued for ages that this should work and we already have plenty of evidence that it does.
I agree that 1-3 need more attention, thanks for raising them.
There’s a simple disconnect here between chess and self-supervised learning. You’re probably aware of it but it it’s worth mentioning. Chess algorithms were historically designed to win at chess. In contrast, the point of self-supervised learning is to extract representations that are useful in general. For example, to solve a new tasks we can feed the representations into a linear regression, another general algorithm. ML researchers have argued for ages that this should work and we already have plenty of evidence that it does.