LLMs use 1 or more inner layers, so shouldn’t the proof apply to them?
what proof?
Of the universal approximation theorem
How are inner layers relevant?
LLMs are neural networks, neural networks are proven to be able to approximate any function to an arbitrary close degree, hence LLMs are able to approximate any function to an arbitrary close degree (given enough layers, of course).
LLMs use 1 or more inner layers, so shouldn’t the proof apply to them?
what proof?
Of the universal approximation theorem
How are inner layers relevant?
LLMs are neural networks, neural networks are proven to be able to approximate any function to an arbitrary close degree, hence LLMs are able to approximate any function to an arbitrary close degree (given enough layers, of course).