Huh, thanks for this pointer! I had not read about NTK (Neural Tangent Kernel) before. What I understand you saying is something like SGD mainly affects weights the last layer, and the propagation down to each earlier layer is weakened by a factor, creating the exponential behaviour? This seems somewhat plausible though I don’t know enough about NTK to make a stronger statement.
I don’t understand the simulation you run (I’m not familiar with that equation, is this a common thing to do?) but are you saying the y levels of the 5 lines (simulating 5 layers) at the last time-step (finished training) should be exponentially increasing, from violet to red, green, orange, and blue? It doesn’t look exponential by eye? Or are you thinking of the value as a function of x (training time)?
I appreciate your comment, and looking for mundane explanations though! This seems the kind of thing where I would later say “Oh of course”
You’re right, that’s not an exponential. I was wrong. I don’t trust my toy model enough to be convinced my overall point is wrong. Unfortunately I don’t have the time this week to run something more in-depth.
Huh, thanks for this pointer! I had not read about NTK (Neural Tangent Kernel) before. What I understand you saying is something like SGD mainly affects weights the last layer, and the propagation down to each earlier layer is weakened by a factor, creating the exponential behaviour? This seems somewhat plausible though I don’t know enough about NTK to make a stronger statement.
I don’t understand the simulation you run (I’m not familiar with that equation, is this a common thing to do?) but are you saying the y levels of the 5 lines (simulating 5 layers) at the last time-step (finished training) should be exponentially increasing, from violet to red, green, orange, and blue? It doesn’t look exponential by eye? Or are you thinking of the value as a function of x (training time)?
I appreciate your comment, and looking for mundane explanations though! This seems the kind of thing where I would later say “Oh of course”
You’re right, that’s not an exponential. I was wrong. I don’t trust my toy model enough to be convinced my overall point is wrong. Unfortunately I don’t have the time this week to run something more in-depth.