I checked the intermediate network activations. It turns out the meta-learned network generates all-negative activations for the final linear layer, so the the relu activations zero out the final layer’s output (other than bias), regardless of initial network input.
I’ve begun experiments with flipped base and meta functions (network initially models sin(x) and resists being retrained to model f(x) = 1).
I checked the intermediate network activations. It turns out the meta-learned network generates all-negative activations for the final linear layer, so the the relu activations zero out the final layer’s output (other than bias), regardless of initial network input.
I’ve begun experiments with flipped base and meta functions (network initially models sin(x) and resists being retrained to model f(x) = 1).
Could you please share the results in case you ended up finishing those experiments?