People worked on capabilities for decades, and never got anywhere until recently, when the hardware caught up, and it was discovered that scaling works unexpectedly well.
If I believed that, then maybe I’d believe (like you seem to do) that there is no strong reason to believe that alignment project cannot be finished successfully before the capabilities project creates an unaligned super-human AI. I’m not saying scaling and hardware improvement have not been important: I’m saying they were not sufficient: algorithmic improvements were quite necessary for the field to arrive at anything like ChatGPT, and at least as early as 2006, there were algorithm improvements that almost everyone in the machine-learning field recognized as breakthrough or important insights. (Someone more knowledgeable about the topic might be able to push the date back into the 1990s or earlier.)
After the publication 19 years ago by Hinton et al of “A Fast Learning Algorithm for Deep Belief Nets”, basically all AI researchers recognized it as a breakthrough. Building on it, was AlexNet in 2012, again recognized as an important breakthrough by essentially everyone in the field (and if some people missed it then certainly generational adversarial networks, ResNets and AlphaGo convinced them). AlexNet was the first deep model trained on GPUs, a technique essential for the major breakthrough in 2017 reported in the paper “Attention is all you need”.
In contrast, we’ve seen nothing yet in the field of alignment that is as unambiguously a breakthrough as is the 2006 paper by Hinton et al or 2012′s AlexNet or (emphatically) the 2017 paper “Attention is all you need”. In fact I suspect that some researchers could tell that the attention mechanism reported by Bahdanau et al in 2015 or the Seq2Seq models reported on by Sutskever et al in 2014 was evidence that deep-learning language models were making solid progress and that a blockbuster insight like “attention is all you need” is probably only a few years away.
The reason I believe it is very unlikely for the alignment research project to succeed before AI kills us all is that in machine learning or the deep-learning subfield of machine learning, what was recognized by essentially everyone in the field as a minor or major breakthrough has occurred every few years. Many of these breakthrough rely on earlier breakthroughs (i.e., it is very unlikely for the sucessive breakthrough to have occurred if the earlier breakthrough had not been disseminated to the community of researcher). During this time, despite very talented people working on it, there has been zero results in alignment research that the entire field of alignment researchers would consider a breakthrough. That does not mean it is impossible for the alignment project to be finished in time, but it does IMO make it critical for the alignment project to be prosecuted in such a way that it does not inadvertently assist the capabilities project.
Yes, much more money has been spent on capability research the last 20 years than on alignment research, but money doesn’t help all that much to speed up research in which to have any hope of solving the problem, the researchers need insight X or X2, and to have any hope of arriving at insight X, they need insights Y and Y2, and to have much hope at all of arriving at Y, they need insight Z.
Even if building intelligence requires solving many many problems, preventing that intelligence from killing you may just require solving a single very hard problem. We may go from having no idea to having a very good idea.
I don’t know. My view is that we can’t be sure of these things.
If I believed that, then maybe I’d believe (like you seem to do) that there is no strong reason to believe that alignment project cannot be finished successfully before the capabilities project creates an unaligned super-human AI. I’m not saying scaling and hardware improvement have not been important: I’m saying they were not sufficient: algorithmic improvements were quite necessary for the field to arrive at anything like ChatGPT, and at least as early as 2006, there were algorithm improvements that almost everyone in the machine-learning field recognized as breakthrough or important insights. (Someone more knowledgeable about the topic might be able to push the date back into the 1990s or earlier.)
After the publication 19 years ago by Hinton et al of “A Fast Learning Algorithm for Deep Belief Nets”, basically all AI researchers recognized it as a breakthrough. Building on it, was AlexNet in 2012, again recognized as an important breakthrough by essentially everyone in the field (and if some people missed it then certainly generational adversarial networks, ResNets and AlphaGo convinced them). AlexNet was the first deep model trained on GPUs, a technique essential for the major breakthrough in 2017 reported in the paper “Attention is all you need”.
In contrast, we’ve seen nothing yet in the field of alignment that is as unambiguously a breakthrough as is the 2006 paper by Hinton et al or 2012′s AlexNet or (emphatically) the 2017 paper “Attention is all you need”. In fact I suspect that some researchers could tell that the attention mechanism reported by Bahdanau et al in 2015 or the Seq2Seq models reported on by Sutskever et al in 2014 was evidence that deep-learning language models were making solid progress and that a blockbuster insight like “attention is all you need” is probably only a few years away.
The reason I believe it is very unlikely for the alignment research project to succeed before AI kills us all is that in machine learning or the deep-learning subfield of machine learning, what was recognized by essentially everyone in the field as a minor or major breakthrough has occurred every few years. Many of these breakthrough rely on earlier breakthroughs (i.e., it is very unlikely for the sucessive breakthrough to have occurred if the earlier breakthrough had not been disseminated to the community of researcher). During this time, despite very talented people working on it, there has been zero results in alignment research that the entire field of alignment researchers would consider a breakthrough. That does not mean it is impossible for the alignment project to be finished in time, but it does IMO make it critical for the alignment project to be prosecuted in such a way that it does not inadvertently assist the capabilities project.
Yes, much more money has been spent on capability research the last 20 years than on alignment research, but money doesn’t help all that much to speed up research in which to have any hope of solving the problem, the researchers need insight X or X2, and to have any hope of arriving at insight X, they need insights Y and Y2, and to have much hope at all of arriving at Y, they need insight Z.
Even if building intelligence requires solving many many problems, preventing that intelligence from killing you may just require solving a single very hard problem. We may go from having no idea to having a very good idea.
I don’t know. My view is that we can’t be sure of these things.