I know this sounds fantastic but can someone please dumb down what KANs are for me, why they’re so revolutionary (in practice, not in theory) that all the big labs would wanna switch to them?
Or is it the case that having MLPs is still a better thing for GPUs and in practice that will not change?
And how are KANs different from what SAEs attempt to do
MLP or KAN doesn’t make much difference for the GPUs as it is lots of matrix multiplications anyway. It might make some difference in how the data is routed to all the GPU cores as the structure (width, depth) of the matrixes might be different, but I don’t know the details of that.
I know this sounds fantastic but can someone please dumb down what KANs are for me, why they’re so revolutionary (in practice, not in theory) that all the big labs would wanna switch to them?
Or is it the case that having MLPs is still a better thing for GPUs and in practice that will not change?
And how are KANs different from what SAEs attempt to do
MLP or KAN doesn’t make much difference for the GPUs as it is lots of matrix multiplications anyway. It might make some difference in how the data is routed to all the GPU cores as the structure (width, depth) of the matrixes might be different, but I don’t know the details of that.