A couple points. Your bit about connectivism IMO ignores the fact that biological systems have their own constraints. If you looked at animals and guessed that the most energy efficient, simple to maintain way to move on most terrains is probably legs, you’d be wrong, it’s wheels or threads. But legs are what organic life that needs to keep all its moving parts connected could come up with.
A similar problem might be at work with connectivism. We designed MLPs in the first place by drawing inspiration from neurons, so we were kinda biased. As a general rule, we know that, if made arbitrarily large, they’re essentially universal functions. Any universal function ought to be potentially equivalent; you could make it a crazily high dimensional spline or what have you, the point is you have a domain and a codomain and a bunch of parameters you can tune to make that function fit your data as closely as possible. What determines the choice is then practicality of both computation and parameter fitting. Connected systems we use in ML happen to be very convenient for this, but I’d be fairly surprised if that was the same reason why biology uses them. It would mean our brain does gradient descent and backpropagation too. And for ML too there might be better choices that we just haven’t discovered yet.
That said, I don’t think connected systems are necessarily a bad paradigm. I agree with you that the complexity is likely due to the fact that we’re fitting a really really complex function in a space of dimensionality so high the human mind can’t even begin to visualize it, so no wonder the parameters are hard to figure out. What might be possible though is to design a connected system to have dedicated, distinct functional areas that make it more “modular” in its overall structure (either by prearranging those, or by rearranging blocks between stages of training). That could make the process more complex or the final result less efficient, the same way if you compile a program for debugging it isn’t as performant as one optimized for speed. But the program compiled for debugging you can look inside, and that’s kind of important and useful. The essence of the complaint here is IMO that as far as these trade-offs go, researchers seem all-in on the performance aspect and not spending nearly enough effort debugging their “code”. It may be that the alternative to giant inscrutable matrices might just be a bunch of smaller, more scrutable ones, but hey. Would be progress!
A couple points. Your bit about connectivism IMO ignores the fact that biological systems have their own constraints. If you looked at animals and guessed that the most energy efficient, simple to maintain way to move on most terrains is probably legs, you’d be wrong, it’s wheels or threads. But legs are what organic life that needs to keep all its moving parts connected could come up with.
A similar problem might be at work with connectivism. We designed MLPs in the first place by drawing inspiration from neurons, so we were kinda biased. As a general rule, we know that, if made arbitrarily large, they’re essentially universal functions. Any universal function ought to be potentially equivalent; you could make it a crazily high dimensional spline or what have you, the point is you have a domain and a codomain and a bunch of parameters you can tune to make that function fit your data as closely as possible. What determines the choice is then practicality of both computation and parameter fitting. Connected systems we use in ML happen to be very convenient for this, but I’d be fairly surprised if that was the same reason why biology uses them. It would mean our brain does gradient descent and backpropagation too. And for ML too there might be better choices that we just haven’t discovered yet.
That said, I don’t think connected systems are necessarily a bad paradigm. I agree with you that the complexity is likely due to the fact that we’re fitting a really really complex function in a space of dimensionality so high the human mind can’t even begin to visualize it, so no wonder the parameters are hard to figure out. What might be possible though is to design a connected system to have dedicated, distinct functional areas that make it more “modular” in its overall structure (either by prearranging those, or by rearranging blocks between stages of training). That could make the process more complex or the final result less efficient, the same way if you compile a program for debugging it isn’t as performant as one optimized for speed. But the program compiled for debugging you can look inside, and that’s kind of important and useful. The essence of the complaint here is IMO that as far as these trade-offs go, researchers seem all-in on the performance aspect and not spending nearly enough effort debugging their “code”. It may be that the alternative to giant inscrutable matrices might just be a bunch of smaller, more scrutable ones, but hey. Would be progress!