A bit of a side note, but I don’t even think you need to appeal to new architectures—it looks like the NTK approximation performs substantially worse even with just regular MLPs (see this paper, among others).
A bit of a side note, but I don’t even think you need to appeal to new architectures—it looks like the NTK approximation performs substantially worse even with just regular MLPs (see this paper, among others).