Interesting. But CNNs were developed originally for a reason to begin with, and MLP-mixer does mention a rather specific architecture as well as “modern regularization techniques”. I’d say all of that counts as baking in some inductive biases in the model though I agree it’s a very light touch.
Interesting. But CNNs were developed originally for a reason to begin with, and MLP-mixer does mention a rather specific architecture as well as “modern regularization techniques”. I’d say all of that counts as baking in some inductive biases in the model though I agree it’s a very light touch.