there’s a common narrative in which AI progress has come mostly from throwing more and more compute at relatively-dumb algorithms.
Is this context-specific to AI? This position seems to imply that new algorithms come out of the box at only a factor 2 above maximum efficiency, which seems like an extravagant claim (if anyone were to actually make it).
In the general software engineering context, I understood the consensus narrative to be that code has gotten less efficient on average, due to the free gains coming from Moore’s Law permitting a more lax approach.
Separately, regarding the bitter lesson: I have seen this come up mostly in the context of the value of data. Some example situations are the supervised vs. unsupervised learning approaches; AlphaGo’s self-play training; questions about what kind of insights the Chinese government AI programs will be able to deliver with the expected expansion of surveillance data, etc. The way I understand this is that compute improvements have proven more valuable than domain expertise (the first approach) and big data (the most recent contender).
My intuitive guess for the cause is that compute is the perspective that lets us handle the dimensionality problem at all gracefully.
Reflecting on this, I think I should have said that algorithms are the perspective that lets us handle dimensionality gracefully, but also that algorithms and compute are really the same category, because algorithms are how compute is exploited.
Algorithm vs compute feels like a second-order comparison in the same way as CPU vs GPU, or RAM vs Flash, or SSD vs HDD, just on the abstract side of the physical/abstraction divide. I contrast this with compute v. data v. expertise, which feel like the first-order comparison.
Chris Rackauckas as an informal explanation for algorithm efficiency which I always think of in this context. The pitch is that your algorithm will be efficient in line with how much information about your problem it has, because it can exploit that information.
Is this context-specific to AI? This position seems to imply that new algorithms come out of the box at only a factor 2 above maximum efficiency, which seems like an extravagant claim (if anyone were to actually make it).
In the general software engineering context, I understood the consensus narrative to be that code has gotten less efficient on average, due to the free gains coming from Moore’s Law permitting a more lax approach.
Separately, regarding the bitter lesson: I have seen this come up mostly in the context of the value of data. Some example situations are the supervised vs. unsupervised learning approaches; AlphaGo’s self-play training; questions about what kind of insights the Chinese government AI programs will be able to deliver with the expected expansion of surveillance data, etc. The way I understand this is that compute improvements have proven more valuable than domain expertise (the first approach) and big data (the most recent contender).
My intuitive guess for the cause is that compute is the perspective that lets us handle the dimensionality problem at all gracefully.
Reflecting on this, I think I should have said that algorithms are the perspective that lets us handle dimensionality gracefully, but also that algorithms and compute are really the same category, because algorithms are how compute is exploited.
Algorithm vs compute feels like a second-order comparison in the same way as CPU vs GPU, or RAM vs Flash, or SSD vs HDD, just on the abstract side of the physical/abstraction divide. I contrast this with compute v. data v. expertise, which feel like the first-order comparison.
Chris Rackauckas as an informal explanation for algorithm efficiency which I always think of in this context. The pitch is that your algorithm will be efficient in line with how much information about your problem it has, because it can exploit that information.