I expect that these kinds of problems could mostly be solved by scaling up data and compute (although I haven’t read the paper). However, the argument in the post is that even if we did scale up, we couldn’t solve the OOD generalisation problems.
I expect that these kinds of problems could mostly be solved by scaling up data and compute (although I haven’t read the paper). However, the argument in the post is that even if we did scale up, we couldn’t solve the OOD generalisation problems.