So, where are the Knuths of the modern era? Why is modern AI dominated by the Lorem Epsoms of the world? Where is the craftsmanship? Why are our AI tools optimized for seeming good, rather than being good?
I’m a bit confused by your confusion, and by the fact that your post does not contain what seems to me like the most straightforward explanation of these phenomena. An explanation that I am almost fully certain you are aware of, and which seems to be almost universally agreed upon by those interested (at any level) in interpretability in ML.
Namely the fact that, starting in the 2010s, it happened to be the case (for a ton of historically contingent reasons) that top AI companies (at the beginning, and followed by other ML hubs and researchers afterwards) realized the bitter lesson is basically correct: attempts to hard-code human knowledge or intuition into frontier models ultimately always harm their performance in the long-term compared to “literally just scale the model with more data and compute.” This led to a focus, among experts and top engineers, on figuring out scaling laws, ways of improving the quality and availability of data (perhaps through synthetic generation methods), ways of creating better end-user products through stuff like fine-tuning and RLHF, etc, instead of the older GOFAI stuff of trying to figure out at a deeper level what is going on inside the model.
Another way of saying this is that top researchers and companies ultimately stumbled on an AI paradigm which increased capabilities significantly more than had been achievable previously, but at the cost of strongly decoupling “capability improvements” and “interpretability improvements” as distinct things that researchers and engineers could focus on. It’s not that capability and interpretability were necessarily tightly correlated in the past; that is not the claim I am making. Rather, I am saying that in the pre-(transformer + RL) era, the way you generated improvements in your models/AI was by figuring out specific issues and analyzing them deeply to find out how to get around them, whereas now, a far simpler, easier, less insight-intensive approach became available: literally just scaling up the model with more data and compute.
So the basic point is that you no longer see all this cool research on the internal representations that models generate of high-dimensional data like word embeddings (such as the word2vec stuff you are referring to in the second footnote) because you no longer have nearly as much of a need for these insights in order to improve the capabilities/performance of the AI tools currently in use. It’s fundamentally an issue with demand, not with supply. And the demand from the interpretability-focused AI alignment community is just nowhere close to large enough to bridge the gap and cover the loss generated by the shift in paradigm focus and priorities among the capabilities/”normie” AI research community.
Indeed, the notion that nowadays, the reason you no longer have deep thinkers who try to figure out what is going on or are “motivated by reasons” in how they approach these issues, is somehow because “careful thinkers read LessWrong and decided against contributing to AI progress,” seems… rather ridiculous to me? It’s not like I enjoy responding to an important question that you are asking with derision in lieu of a substantive response, but… I mean, the literal authors of the word2vec paper you cited were AI (capabilities) researchers working at top companies, not AI alignment researchers! Sure, some people like Bengio and Hofstadter (less relevant in practical terms) who are obviously not “LARP-ing impostors” in Wentworth’s terminology have made the shift from capabilities work to trying to raise public awareness of alignment/safety/control problems. But the vast majority (according to personal experience, general impressions, as well as the current state of the discourse on these topics) absolutely have not, and since they were the ones generating the clever insights back in the day, of course it makes sense that the overall supply of these insights has gone down.
I just really don’t see how it could be the case that “people refuse to generate these insights because they have been convinced by AI safety advocates that it would dangerously increase capabilities and shorten timelines” and “people no longer generate these insights as much because they are instead focusing on other tasks that improve model capabilities more rapidly and robustly, given the shifted paradigm” are two hypotheses that can be given similar probabilities in any reasonable person’s mind. The latter should be at least a few orders of magnitude more likely than the former, as I see it.
I’m a bit confused by your confusion, and by the fact that your post does not contain what seems to me like the most straightforward explanation of these phenomena. An explanation that I am almost fully certain you are aware of, and which seems to be almost universally agreed upon by those interested (at any level) in interpretability in ML.
Namely the fact that, starting in the 2010s, it happened to be the case (for a ton of historically contingent reasons) that top AI companies (at the beginning, and followed by other ML hubs and researchers afterwards) realized the bitter lesson is basically correct: attempts to hard-code human knowledge or intuition into frontier models ultimately always harm their performance in the long-term compared to “literally just scale the model with more data and compute.” This led to a focus, among experts and top engineers, on figuring out scaling laws, ways of improving the quality and availability of data (perhaps through synthetic generation methods), ways of creating better end-user products through stuff like fine-tuning and RLHF, etc, instead of the older GOFAI stuff of trying to figure out at a deeper level what is going on inside the model.
Another way of saying this is that top researchers and companies ultimately stumbled on an AI paradigm which increased capabilities significantly more than had been achievable previously, but at the cost of strongly decoupling “capability improvements” and “interpretability improvements” as distinct things that researchers and engineers could focus on. It’s not that capability and interpretability were necessarily tightly correlated in the past; that is not the claim I am making. Rather, I am saying that in the pre-(transformer + RL) era, the way you generated improvements in your models/AI was by figuring out specific issues and analyzing them deeply to find out how to get around them, whereas now, a far simpler, easier, less insight-intensive approach became available: literally just scaling up the model with more data and compute.
So the basic point is that you no longer see all this cool research on the internal representations that models generate of high-dimensional data like word embeddings (such as the word2vec stuff you are referring to in the second footnote) because you no longer have nearly as much of a need for these insights in order to improve the capabilities/performance of the AI tools currently in use. It’s fundamentally an issue with demand, not with supply. And the demand from the interpretability-focused AI alignment community is just nowhere close to large enough to bridge the gap and cover the loss generated by the shift in paradigm focus and priorities among the capabilities/”normie” AI research community.
Indeed, the notion that nowadays, the reason you no longer have deep thinkers who try to figure out what is going on or are “motivated by reasons” in how they approach these issues, is somehow because “careful thinkers read LessWrong and decided against contributing to AI progress,” seems… rather ridiculous to me? It’s not like I enjoy responding to an important question that you are asking with derision in lieu of a substantive response, but… I mean, the literal authors of the word2vec paper you cited were AI (capabilities) researchers working at top companies, not AI alignment researchers! Sure, some people like Bengio and Hofstadter (less relevant in practical terms) who are obviously not “LARP-ing impostors” in Wentworth’s terminology have made the shift from capabilities work to trying to raise public awareness of alignment/safety/control problems. But the vast majority (according to personal experience, general impressions, as well as the current state of the discourse on these topics) absolutely have not, and since they were the ones generating the clever insights back in the day, of course it makes sense that the overall supply of these insights has gone down.
I just really don’t see how it could be the case that “people refuse to generate these insights because they have been convinced by AI safety advocates that it would dangerously increase capabilities and shorten timelines” and “people no longer generate these insights as much because they are instead focusing on other tasks that improve model capabilities more rapidly and robustly, given the shifted paradigm” are two hypotheses that can be given similar probabilities in any reasonable person’s mind. The latter should be at least a few orders of magnitude more likely than the former, as I see it.