This is a good hypothesis, and seems like it can be checked by removing the first however many timesteps from the svd calculation
I have! I tried the same thing on a simpler network trained on an algorithmic task, and got similar results. In that case I got 10 singular vectors on 8M(?) time steps.
This is a good hypothesis, and seems like it can be checked by removing the first however many timesteps from the svd calculation
I have! I tried the same thing on a simpler network trained on an algorithmic task, and got similar results. In that case I got 10 singular vectors on 8M(?) time steps.