Yep I used my own re-implementation, which somehow has slightly different behavior.
I’ll also note that the task in the report is modular addition while figure 1 from the paper (the one with the red and green lines for train/val) is the significantly harder permutation group task.
Yep I used my own re-implementation, which somehow has slightly different behavior.
I’ll also note that the task in the report is modular addition while figure 1 from the paper (the one with the red and green lines for train/val) is the significantly harder permutation group task.