To see if Omnigrok’s mechanism for enabling/stopping grokking works beyond the three areas they investigated. If it works, then we are more sure we know how to stop it occuring, and instead force the model to reach the same performance incrementally. Which might make it easier to predict future performance, but also just to get some more info about the phenomenon. Plus, like, I’m implementing some deep RL algorithms anyway, so might as well, right?
Curios—why would you want to prevent grokking? Normally one would want to encourage it.
To see if Omnigrok’s mechanism for enabling/stopping grokking works beyond the three areas they investigated. If it works, then we are more sure we know how to stop it occuring, and instead force the model to reach the same performance incrementally. Which might make it easier to predict future performance, but also just to get some more info about the phenomenon. Plus, like, I’m implementing some deep RL algorithms anyway, so might as well, right?