My submission: when we teach modular arithmetic to people, we do it using the metaphor of clock arithmetic. Well, if you ignore the multiple frequencies and argmax weirdness, clock arithmetic is exactly what this network is doing! Find the coordinates of rotating the hour hand (on a 113-hour clock) x hours, then y hours, use trig identities to work out what it would be if you rotated x+y hours, then count how many steps back you have to rotate to get to 0 to tell where you ended up. In fairness, the final step is a little bit different than the usual imagined rule of “look at the hour mark where the hand ends up”, but not so different that clock arithmetic counts as a bad prediction IMO.
I agree a rotation matrix story would fit better, but I do think it’s a fair analogy: the numbers stored are just coses and sines, aka the x and y coordinates of the hour hand.
Like, the only reason we’re calling it a “Fourier basis” is that we’re looking at a few different speeds of rotation, in order to scramble the second-place answers that almost get you a cos of 1 at the end, while preserving the actual answer.
My submission: when we teach modular arithmetic to people, we do it using the metaphor of clock arithmetic. Well, if you ignore the multiple frequencies and argmax weirdness, clock arithmetic is exactly what this network is doing! Find the coordinates of rotating the hour hand (on a 113-hour clock) x hours, then y hours, use trig identities to work out what it would be if you rotated x+y hours, then count how many steps back you have to rotate to get to 0 to tell where you ended up. In fairness, the final step is a little bit different than the usual imagined rule of “look at the hour mark where the hand ends up”, but not so different that clock arithmetic counts as a bad prediction IMO.
Is this really an accurate analogy? I feel like clock arithmetic would be more like representing it as a rotation matrix, not a Fourier basis.
I agree a rotation matrix story would fit better, but I do think it’s a fair analogy: the numbers stored are just coses and sines, aka the x and y coordinates of the hour hand.
Like, the only reason we’re calling it a “Fourier basis” is that we’re looking at a few different speeds of rotation, in order to scramble the second-place answers that almost get you a cos of 1 at the end, while preserving the actual answer.