This is a casual thought and by no means something I’ve thought hard about—I’m curious whether b is a lagging indicator, which is to say, there’s actually more magic going on in the weights and once weights go through this change, b catches up to it.
Another speculative thought, let’s say we are moving from 4* → 5* and |W_3| is the new W that is taking on high magnitude. Does this occur because somehow W_3 has enough internal individual weights to jointly look at it’s two (new) neighbors’ W_i`s roughly equally?
Does the cos similarity and/or dot product of this new W_3 with its neighbors grow during the 4* → 5* transition (and does this occur prior to the change in b?)
This is a casual thought and by no means something I’ve thought hard about—I’m curious whether b is a lagging indicator, which is to say, there’s actually more magic going on in the weights and once weights go through this change, b catches up to it.
Another speculative thought, let’s say we are moving from 4* → 5* and |W_3| is the new W that is taking on high magnitude. Does this occur because somehow W_3 has enough internal individual weights to jointly look at it’s two (new) neighbors’ W_i`s roughly equally?
Does the cos similarity and/or dot product of this new W_3 with its neighbors grow during the 4* → 5* transition (and does this occur prior to the change in b?)
The change in the matrix W and the bias b happen at the same time, it’s not a lagging indicator.