I’ll add that I’m pretty sure that RL is doing something. The authors claim that no one has applied search methods for 4x4 matrix multiplication or larger, and the branching factor on brute force search without a big heuristic grows something like the 6th power of n? So it seems doubtful that they will scale.
That being said, I agree that it’s a bit odd to not do a head-to-head comparison at equal compute, though. The authors just cite related work (which uses much less compute) and claims superiority over them.
(Most of my comment was ninja’ed by Paul)
I’ll add that I’m pretty sure that RL is doing something. The authors claim that no one has applied search methods for 4x4 matrix multiplication or larger, and the branching factor on brute force search without a big heuristic grows something like the 6th power of n? So it seems doubtful that they will scale.
That being said, I agree that it’s a bit odd to not do a head-to-head comparison at equal compute, though. The authors just cite related work (which uses much less compute) and claims superiority over them.