Yup, people have done this(taking the infinite-width limit at the same time): see here, here. Generally the kernels do worse than the original networks, but not by a lot. On the other hand, they’re usually applied to problems that aren’t super-hard, where non-neural-net classifiers already worked pretty well. And these models definitely can’t explain feature learning, since the functions computed by individual neurons don’t change at all during training.
Yup, people have done this(taking the infinite-width limit at the same time): see here, here. Generally the kernels do worse than the original networks, but not by a lot. On the other hand, they’re usually applied to problems that aren’t super-hard, where non-neural-net classifiers already worked pretty well. And these models definitely can’t explain feature learning, since the functions computed by individual neurons don’t change at all during training.