Oh, huh, that makes a lot of sense! I’ll see if I can reproduce these results.
For example, from a rank-4 case with a bump, here’s the inner product with a single target vector of two learned vectors.
I’m not sure this explains the grokking bumps from the mod add stuff—I’m not sure what the should be “competition” should be given we see the bumps on every key frequency.
Oh, huh, that makes a lot of sense! I’ll see if I can reproduce these results.
I’m not sure this explains the grokking bumps from the mod add stuff—I’m not sure what the should be “competition” should be given we see the bumps on every key frequency.
I’d be very excited to see a reproduction :-)