Edouard Harris comments on Meta learning to gradient hack

Edouard Harris 6 Oct 2021 22:04 UTC
LW: 1 AF: 1
0
AF
Very neat. It’s quite curious that switching to L2 for the base optimizer doesn’t seem to have resulted in the meta-initialized network learning the sine function. What sort of network did you use for the meta-learner? (It looks like the 4-layer network in your Methods refers to your base optimizer, but perhaps it’s the same architecture for both?)
Also, do you know if you end up getting the meta-initialized network to learn the sine function eventually if you train for thousands and thousands of steps? Or does it just never learn no matter how hard you train it?