I don’t think you can get the information of computing the gradient updates to particular weights without actually running that computation (or something equivalent to it).
And presumably one would need empirical feedback (i.e. the value of the objective function we’re optimising the network for on particular inputs) to compute the desired gradient updates.
The idea of the system just predicting the desired gradient updates without any ground truth supervisory signal seems fanciful.
I don’t actually share this intuition.
I don’t think you can get the information of computing the gradient updates to particular weights without actually running that computation (or something equivalent to it).
And presumably one would need empirical feedback (i.e. the value of the objective function we’re optimising the network for on particular inputs) to compute the desired gradient updates.
The idea of the system just predicting the desired gradient updates without any ground truth supervisory signal seems fanciful.
Ehh, protein folding feels equally fanciful to me, figuring out how the protein will fold without actually simulating the physical interactions.
Meanwhile we have humans already editing model weights to change model behavior in desired ways: https://www.lesswrong.com/posts/gRp6FAWcQiCWkouN5/maze-solving-agents-add-a-top-right-vector-make-the-agent-go