Your point about neural nets NEVER being able to extrapolate is wrong. NNs are universal function approximators. A sufficiently large NN with the right weights can therefore approximate the “extrapolation” function (or even approximate whatever extrapolative model you’re training in place of an NN).
I’m pretty sure this is wrong. The universal approximator theorems I’ve seen work by interpolating the function they are fitting; this doesn’t gurantee that they can extrapolate.
In fact it seems to me that a universal approximator theorem can never show that a network is capable of extrapolating? Because the universal approximation has to hold for all possible functions, while extrapolation inherently involves guessing a specific function.
There’s nothing in there that NNs are fundamentally incapable of doing.
The post isn’t talking about fundamental capabilities, it is talking about “today’s methods”.
I was thinking of the NN approximating the “extrapolate” function itself, that thing which takes in partial data and generates an extrapolation. That function is, by assumption, capable of extrapolation from incomplete data. Therefore, I expect a sufficiently precise approximation to that function is able to extrapolate.
It may be helpful to argue from Turing completeness instead. Transformers are Turing complete. If “extrapolation” is Turing computable, then there’s a transformer that implements extrapolation.
Also, the post is talking about what NNs are ever able to do, not just what they can do now. That’s why I thought it was appropriate to bring up theoretical computer science.
I’m pretty sure this is wrong. The universal approximator theorems I’ve seen work by interpolating the function they are fitting; this doesn’t gurantee that they can extrapolate.
In fact it seems to me that a universal approximator theorem can never show that a network is capable of extrapolating? Because the universal approximation has to hold for all possible functions, while extrapolation inherently involves guessing a specific function.
The post isn’t talking about fundamental capabilities, it is talking about “today’s methods”.
I was thinking of the NN approximating the “extrapolate” function itself, that thing which takes in partial data and generates an extrapolation. That function is, by assumption, capable of extrapolation from incomplete data. Therefore, I expect a sufficiently precise approximation to that function is able to extrapolate.
It may be helpful to argue from Turing completeness instead. Transformers are Turing complete. If “extrapolation” is Turing computable, then there’s a transformer that implements extrapolation.
Also, the post is talking about what NNs are ever able to do, not just what they can do now. That’s why I thought it was appropriate to bring up theoretical computer science.