I just came across this paper which derives an expression for the posterior distribution of the weights in each layer in the infinite-width limit. The result: the distribution is unchanged from the prior in every layer but the last. So it indeed seems that there is no feature learning in this limit.
I just came across this paper which derives an expression for the posterior distribution of the weights in each layer in the infinite-width limit. The result: the distribution is unchanged from the prior in every layer but the last. So it indeed seems that there is no feature learning in this limit.