streawkceur comments on How uniform is the neocortex?

streawkceur 9 May 2020 14:19 UTC
1 point
Deep learning is a general method in the sense that most tasks are solved by utilizing a handful of basic tools from a standard toolkit, adapted for the specific task at hand. Once you’ve selected the basic tools, all that’s left is figuring out how to supply the training data, specifying the objective that lets the AI know how well it’s doing, throwing a lot of computation at the problem, and fiddling with details. My understanding is that there typically isn’t much conceptual ingenuity involved in solving the problems, that most of the work goes into fiddling with details, and that trying to be clever doesn’t lead to better results than using standard tricks with more computation and training data. It’s also worth noting that most of the tools in this standard toolkit have been around since the 90′s (e.g. convolutional neural networks, LSTMs, reinforcement learning, backpropagation), and that the recent boom in AI was driven by using these decades-old tools with unprecedented amounts of computation.
Well, the “details” are in fact hard to come up with, can be reused across problems, and do make the difference between working well and not working well! It’s a bit like saying that general relativity fills in some details in the claim that nature is described by differential equations, which was made much earlier.
In the AlexNet paper [1], ReLU units were referred to as nonstandard and referenced from a 2010 paper, and Dropout regularization was introduced as a recent invention from 2012. In fact, the efficiency of computer vision DL architectures has increased faster than that of the silicon since then (https://openai.com/blog/ai-and-efficiency/).
My understanding of the claim made by the “bitter lesson” article you link to is not that intellectual effort is worthless when it comes to AI, but that the effort should be directed at improving the efficiency with which the computer learns from training data, not implementing human understanding of the problem in the computer directly.
In a very general sense, e.g. attention mechanisms can be understood to be inspired by subjective experience though (even though here, as well, the effort was in developing things that work for computers, not in thinking really hard about how a human pays attention and formalizing that).
[1] https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf