we don’t have a generic technique to define capacity across different architectures and loss functions
Got it. I imagine that for some particular architectures, and given some particular network weights, you can numerically compute the marginal returns to capacity curves, but that it’s hard to express capacity analytically as a function of network weights since you really need to know what the particular features are in order to compute returns to capacity—is that correct?
Thanks for this.
Got it. I imagine that for some particular architectures, and given some particular network weights, you can numerically compute the marginal returns to capacity curves, but that it’s hard to express capacity analytically as a function of network weights since you really need to know what the particular features are in order to compute returns to capacity—is that correct?