EDIT: thank you so much for replying to the strongest part of my argument, no one else tried to address it (despite many downvotes).
I disagree with the position that technical AI alignment research is counterproductive due to increasing capabilities, but I think this is very complicated and worth thinking about in greater depth.
Do you think it’s possible, that your intuition on alignment research being counterproductive, is because you compared the plausibility of the two outcomes:
Increasing alignment research causes people to solve AI alignment, and humanity survives.
Increasing alignment research led to an improvement in AI capabilities, allowing AI labs to build a superintelligence which then kills humanity.
And you decided that outcome 2 felt more likely?
Well, that’s the wrong comparison to make.
The right comparison should be:
Increasing alignment research causes people to improve AI alignment, and humanity survives in a world where we otherwise wouldn’t survive.
Increasing alignment research led to an improvement in AI capabilities, allowing AI labs to build a superintelligence which then kills humanity in a world where we otherwise would survive.
In this case, I think even you would agree what P(1) > P(2).
P(2) is very unlikely because if increasing alignment research really would lead to such a superintelligence, and it really would kill humanity… then let’s be honest, we’re probably doomed in that case anyways, even without increasing alignment research.
If that really was the case, the only surviving civilizations would have had different histories, or different geographies (e.g. only a single continent with enough space for a single country), leading to a single government which could actually enforce an AI pause.
We’re unlikely to live in a world so pessimistic that alignment research is counterproductive, yet so optimistic that we could survive without that alignment research.
we’re probably doomed in that case anyways, even without increasing alignment research.
I believe we’re probably doomed anyways.
I think even you would agree what P(1) > P(2)
Sorry to disappoint you, but I do not agree.
Although I don’t consider it quite impossible that we will figure out alignment, most of my hope for our survival is in other things, such as a group taking over the world and then using their power to ban AI research. (Note that that is in direct contradiction to your final sentence.) So for example, if Putin or Xi were dictator of the world, my guess is that there is a good chance he would choose to ban all AI research. Why? It has unpredictable consequences. We Westerners (particularly Americans) are comfortable with drastic change, even if that change has drastic unpredictable effects on society; non-Westerners are much more skeptical: there have been too many invasions, revolutions and peasant rebellions that have killed millions in their countries. I tend to think that the main reason Xi supports China’s AI industry is to prevent the US and the West from superseding China and if that consideration were removed (because for example he had gained dictatorial control over the whole world) he’d choose to just shut it down (and he wouldn’t feel that need to have a very strong argument for that shutting it down like Western decision-makers would: non-Western leader shut important things down all the time or at least they would if the governments they led had the funding and the administrative capacity to do so).
Of course Xi’s acquiring dictatorial control over the whole world is extremely unlikely, but the magnitude of the technological changes and societal changes that are coming will tend to present opportunities for certain coalitions to gain and to keep enough power to shut AI research down worldwide. (Having power in all countries hosting leading-edge fabs is probably enough.) I don’t think this ruling coalition necessarily need to believe that AI presents a potent risk of human extinction for them to choose to shut it down.
I am aware that some reading this will react to “some coalition manages to gain power over the whole world” even more negatively than to “AI research causes the extinction of the entire human race”. I guess my response is that I needed an example of a process that could save us and that would feel plausible—i.e., something that might actually happen. I hasten add that there might be other processes that save us that don’t elicit such a negative reaction—including processes the nature of which we cannot even currently imagine.
I’m very skeptical of any intervention that reduces the amount of time we have left in the hopes that this AI juggernaut is not really as potent a threat to us as it currently appears. I was much much less skeptical of alignment research 20 years ago, but since then a research organization has been exploring the solution space and the leader of that organization (Nate Soares) and its most senior researcher (Eliezer) are reporting that the alignment project is almost completely hopeless. Yes, this organization (MIRI) is kind of small, but it has been funded well enough to keep about a dozen top-notch researchers on the payroll and it has been competently led. Also, for research efforts like this, how many years the team had to work on the problem is more important than the size of the team, and 22 years is a pretty long time to end up with almost no progress other than some initial insights (around the orthogonality thesis, the fragility of value, convergent instrumental values, CEV as a solution to if the problem were solvable by the current generation of human beings.
OK, if I’m being fair and balanced, then I have to concede that it was probably only in 2006 (when Eliezer figured out how to write a long intellectually-dense blog post every day) or even only in 2008 (when Anna Salamon join the organization—she was very good at recruiting and had a lot of energy to travel and to meet people) that Eliezer’s research organization could start to pick and choose among a broad pool of very talented people, but still between 2008 and now is 17 years, which again is a long time for a strong team to fail to make even a decent fraction of the progress humanity would seem to need to make on the alignment problem if in fact the alignment problem is solvable by spending more money on it. It does not appear to me to be the sort of problem than can be solved with 1 or 2 additional insights; it seems a lot more like the kind of problem where insight 1 is needed, but before any mere human can find insight 1, all the researchers need to have already known insight 2, and to have any hope of finding insight 2, they all would have had to know insight 3, and so on.
I don’t agree that the probability of alignment research succeeding is that low. 17 years or 22 years of trying and failing is strong evidence against it being easy, but doesn’t prove that it is so hard that increasing alignment research is useless.
People worked on capabilities for decades, and never got anywhere until recently, when the hardware caught up, and it was discovered that scaling works unexpectedly well.
There is a chance that alignment research now might be more useful than alignment research earlier, though there is uncertainty in everything.
It’s unlikely that 22 years of alignment research is insufficient but 23 years of alignment research is sufficient.
But what’s even more unlikely, is the chance that $200 billion on capabilities research plus $0.1 billion on alignment research is survivable, while $210 billion on capabilities research plus $1 billion on alignment research is deadly.
In the same way adding a little alignment research is unlikely to turn failure into success, adding a little capabilities research is unlikely to turn success into failure.
It’s also unlikely that alignment effort is even deadlier than capabilities effort dollar for dollar. That would mean reallocating alignment effort into capabilities effort paradoxically slows down capabilities and saves everyone.
Even if you are right
Even if you are right that delaying AI capabilities is all that matters, Anthropic still might be a good thing.
Even if Anthropic disappeared, or never existed in the first place, the AI investors will continue to pay money for research, and the AI researchers will continue to do research for money. Anthropic was just the middleman.
If Anthropic never existed, the middlemen would consist of only OpenAI, DeepMind, Meta AI, and other labs. These labs will not only act as the middle man, but lobby against regulation far more aggressively than Anthropic, and may discredit the entire “AI Notkilleveryoneism” movement.
To continue existing at one of these middlemen, you cannot simply stop paying the AI researchers for capabilities research, otherwise the AI investors and AI customers will stop paying you in turn. You cannot stem the flow, you can only decide how much goes through you.
It’s the old capitalist dilemma of “doing evil or getting out-competed by those who do.”
For their part, Anthropic redirected some of that flow to alignment research, and took the small amount of precautions which they could afford to take. They were also less willing to publish capabilities research than other labs. That may be the best one can hope to accomplish against this unstoppable flow from the AI investors to AI researchers.
The small amount of precautions which Anthropic did take may have already costed them their first mover advantage. Had Anthropic raced ahead before OpenAI released ChatGPT, Anthropic may have stolen the limelight, got the early customers and investors, and been bigger than OpenAI.
But what’s even more unlikely, is the chance that $200 billion on capabilities research plus $0.1 billion on alignment research is survivable, while $210 billion on capabilities research plus $1 billion on alignment research is deadly.
This assumes that alignment success is the mostly likely avenue to safety for humankind whereas like I said, I consider other avenues more likely. Actually there needs to be a qualifier on that: I consider other avenues more likely than the alignment project’s succeeding while the current generation of AI researchers remain free to push capabilities: if the AI capabilities juggernaut could be stopped for 150 years, giving the human population time to get smarter and wiser, then alignment is likely (say p = .7) to succeed in my estimation. I am informed by Eliezer in his latest interview that such a success would probably use some technology other than deep learning to create the AI’s capabilities; i.e., deep learning is particularly hard to align.
Central to my thinking is my belief that alignment is just a significantly harder problem than the problem of creating an AI capable of killing us all. Does any of the reasoning you do in your section “the comparision” change if you started believing that alignment is much much harder than creating a superhuman (unaligned) AI?
It will probably come as no great surprise that I am unmoved by the arguments I have seen (including your argument) that Anthropic is so much better than OpenAI that it helps the global situation for me to support Anthropic (if it were up to me, both would be shut down today if I couldn’t delegate the decision to someone else and if I had to decide now with the result that there is no time for me to gather more information) but I’m not very certain and would pay attention to future arguments for supporting Anthropic or some other lab.
Thank you, I’ve always been curious about this point of view because a lot of people have a similar view to yours.
I do think that alignment success is the most likely avenue, but my argument doesn’t require this assumption.
Your view isn’t just that “alternative paths are more likely to succeed than alignment,” but that “alternative paths are so much more likely to succeed than alignment, that the marginal capabilities increase caused by alignment research (or at least Anthropic), makes them unworthwhile.”
To believe that alignment is that hopeless, there should be stronger proof than “we tried it for 22 years, and the prior probability of the threshold being between 22 years and 23 years is low.” That argument can easily be turned around to argue why more alignment research is equally unlikely to cause harm (and why Anthropic is unlikely to cause harm). I also think multiplying funding can multiply progress (e.g. 4x funding ≈ 2x duration).
If you really want a singleton controlling the whole world (which I don’t agree with), your most plausible path would be for most people to see AI risk as a “desperate” problem, and for governments under desperation to agree on a worldwide military which swears to preserve civilian power structures within each country.[1]
Otherwise, the fact that no country took over the world during the last centuries strongly suggests that no country will in the next few years, and this feels more solid than your argument that “no one figured out alignment in the last 22 years, so no one will in the next few years.”
Out of curiosity, would you agree with this being the most plausible path, even if you disagree with the rest of my argument?
The most plausible story I can imagine quickly right now is the US and China fight a war and the US wins and uses some of the political capital from that win to slow down the AI project, perhaps through control over the world’s leading-edge semiconductor fabs plus pressuring Beijing to ban teaching and publishing about deep learning (to go with a ban on the same things in the West). I believe that basically all the leading-edge fabs in existence or that will be built in the next 10 years are in the countries the US has a lot of influence over or in China. Another story: the technology for “measuring loyalty in humans” gets really good fast, giving the first group to adopt the technology so great an advantage that over a few years the group gets control over the territories where all the world’s leading-edge fabs and most of the trained AI researchers are.
I want to remind people of the context of this conversation: I’m trying to persuade people to refrain from actions that on expectation make human extinction arrive a little quicker because most of our (sadly slim) hope for survival IMHO flows from possibilities other than our solving (super-)alignment in time.
I would go one step further and argue you don’t need to take over territory to shut down the semiconductor supply chain, if enough large countries believed AI risk was a desperate problem they could convince and negotiate the shutdown of the supply chain.
Shutting down the supply chain (and thus all leading-edge semiconductor fabs) could slow the AI project by a long time, but probably not “150 years” since the uncooperative countries will eventually build their own supply chain and fabs.
The ruling coalition can disincentivize the development of a semiconductor supply chain outside the territories it controls by selling world-wide semiconductors that use “verified boot” technology to make it really hard to use the semiconductor to run AI workloads similar to how it is really hard even for the best jailbreakers to jailbreak a modern iPhone.
That’s a good idea! Even today it may be useful for export controls (depending on how reliable it can be made).
The most powerful chips might be banned from export, and have “verified boot” technology inside in case they are smuggled out.
The second most powerful chips might be only exported to trusted countries, and also have this verified boot technology in case these trusted countries end up selling them to less trusted countries who sell them yet again.
People worked on capabilities for decades, and never got anywhere until recently, when the hardware caught up, and it was discovered that scaling works unexpectedly well.
If I believed that, then maybe I’d believe (like you seem to do) that there is no strong reason to believe that alignment project cannot be finished successfully before the capabilities project creates an unaligned super-human AI. I’m not saying scaling and hardware improvement have not been important: I’m saying they were not sufficient: algorithmic improvements were quite necessary for the field to arrive at anything like ChatGPT, and at least as early as 2006, there were algorithm improvements that almost everyone in the machine-learning field recognized as breakthrough or important insights. (Someone more knowledgeable about the topic might be able to push the date back into the 1990s or earlier.)
After the publication 19 years ago by Hinton et al of “A Fast Learning Algorithm for Deep Belief Nets”, basically all AI researchers recognized it as a breakthrough. Building on it, was AlexNet in 2012, again recognized as an important breakthrough by essentially everyone in the field (and if some people missed it then certainly generational adversarial networks, ResNets and AlphaGo convinced them). AlexNet was the first deep model trained on GPUs, a technique essential for the major breakthrough in 2017 reported in the paper “Attention is all you need”.
In contrast, we’ve seen nothing yet in the field of alignment that is as unambiguously a breakthrough as is the 2006 paper by Hinton et al or 2012′s AlexNet or (emphatically) the 2017 paper “Attention is all you need”. In fact I suspect that some researchers could tell that the attention mechanism reported by Bahdanau et al in 2015 or the Seq2Seq models reported on by Sutskever et al in 2014 was evidence that deep-learning language models were making solid progress and that a blockbuster insight like “attention is all you need” is probably only a few years away.
The reason I believe it is very unlikely for the alignment research project to succeed before AI kills us all is that in machine learning or the deep-learning subfield of machine learning, what was recognized by essentially everyone in the field as a minor or major breakthrough has occurred every few years. Many of these breakthrough rely on earlier breakthroughs (i.e., it is very unlikely for the sucessive breakthrough to have occurred if the earlier breakthrough had not been disseminated to the community of researcher). During this time, despite very talented people working on it, there has been zero results in alignment research that the entire field of alignment researchers would consider a breakthrough. That does not mean it is impossible for the alignment project to be finished in time, but it does IMO make it critical for the alignment project to be prosecuted in such a way that it does not inadvertently assist the capabilities project.
Yes, much more money has been spent on capability research the last 20 years than on alignment research, but money doesn’t help all that much to speed up research in which to have any hope of solving the problem, the researchers need insight X or X2, and to have any hope of arriving at insight X, they need insights Y and Y2, and to have much hope at all of arriving at Y, they need insight Z.
Even if building intelligence requires solving many many problems, preventing that intelligence from killing you may just require solving a single very hard problem. We may go from having no idea to having a very good idea.
I don’t know. My view is that we can’t be sure of these things.
EDIT: thank you so much for replying to the strongest part of my argument, no one else tried to address it (despite many downvotes).
I disagree with the position that technical AI alignment research is counterproductive due to increasing capabilities, but I think this is very complicated and worth thinking about in greater depth.
Do you think it’s possible, that your intuition on alignment research being counterproductive, is because you compared the plausibility of the two outcomes:
Increasing alignment research causes people to solve AI alignment, and humanity survives.
Increasing alignment research led to an improvement in AI capabilities, allowing AI labs to build a superintelligence which then kills humanity.
And you decided that outcome 2 felt more likely?
Well, that’s the wrong comparison to make.
The right comparison should be:
Increasing alignment research causes people to improve AI alignment, and humanity survives in a world where we otherwise wouldn’t survive.
Increasing alignment research led to an improvement in AI capabilities, allowing AI labs to build a superintelligence which then kills humanity in a world where we otherwise would survive.
In this case, I think even you would agree what P(1) > P(2).
P(2) is very unlikely because if increasing alignment research really would lead to such a superintelligence, and it really would kill humanity… then let’s be honest, we’re probably doomed in that case anyways, even without increasing alignment research.
If that really was the case, the only surviving civilizations would have had different histories, or different geographies (e.g. only a single continent with enough space for a single country), leading to a single government which could actually enforce an AI pause.
We’re unlikely to live in a world so pessimistic that alignment research is counterproductive, yet so optimistic that we could survive without that alignment research.
I believe we’re probably doomed anyways.
Sorry to disappoint you, but I do not agree.
Although I don’t consider it quite impossible that we will figure out alignment, most of my hope for our survival is in other things, such as a group taking over the world and then using their power to ban AI research. (Note that that is in direct contradiction to your final sentence.) So for example, if Putin or Xi were dictator of the world, my guess is that there is a good chance he would choose to ban all AI research. Why? It has unpredictable consequences. We Westerners (particularly Americans) are comfortable with drastic change, even if that change has drastic unpredictable effects on society; non-Westerners are much more skeptical: there have been too many invasions, revolutions and peasant rebellions that have killed millions in their countries. I tend to think that the main reason Xi supports China’s AI industry is to prevent the US and the West from superseding China and if that consideration were removed (because for example he had gained dictatorial control over the whole world) he’d choose to just shut it down (and he wouldn’t feel that need to have a very strong argument for that shutting it down like Western decision-makers would: non-Western leader shut important things down all the time or at least they would if the governments they led had the funding and the administrative capacity to do so).
Of course Xi’s acquiring dictatorial control over the whole world is extremely unlikely, but the magnitude of the technological changes and societal changes that are coming will tend to present opportunities for certain coalitions to gain and to keep enough power to shut AI research down worldwide. (Having power in all countries hosting leading-edge fabs is probably enough.) I don’t think this ruling coalition necessarily need to believe that AI presents a potent risk of human extinction for them to choose to shut it down.
I am aware that some reading this will react to “some coalition manages to gain power over the whole world” even more negatively than to “AI research causes the extinction of the entire human race”. I guess my response is that I needed an example of a process that could save us and that would feel plausible—i.e., something that might actually happen. I hasten add that there might be other processes that save us that don’t elicit such a negative reaction—including processes the nature of which we cannot even currently imagine.
I’m very skeptical of any intervention that reduces the amount of time we have left in the hopes that this AI juggernaut is not really as potent a threat to us as it currently appears. I was much much less skeptical of alignment research 20 years ago, but since then a research organization has been exploring the solution space and the leader of that organization (Nate Soares) and its most senior researcher (Eliezer) are reporting that the alignment project is almost completely hopeless. Yes, this organization (MIRI) is kind of small, but it has been funded well enough to keep about a dozen top-notch researchers on the payroll and it has been competently led. Also, for research efforts like this, how many years the team had to work on the problem is more important than the size of the team, and 22 years is a pretty long time to end up with almost no progress other than some initial insights (around the orthogonality thesis, the fragility of value, convergent instrumental values, CEV as a solution to if the problem were solvable by the current generation of human beings.
OK, if I’m being fair and balanced, then I have to concede that it was probably only in 2006 (when Eliezer figured out how to write a long intellectually-dense blog post every day) or even only in 2008 (when Anna Salamon join the organization—she was very good at recruiting and had a lot of energy to travel and to meet people) that Eliezer’s research organization could start to pick and choose among a broad pool of very talented people, but still between 2008 and now is 17 years, which again is a long time for a strong team to fail to make even a decent fraction of the progress humanity would seem to need to make on the alignment problem if in fact the alignment problem is solvable by spending more money on it. It does not appear to me to be the sort of problem than can be solved with 1 or 2 additional insights; it seems a lot more like the kind of problem where insight 1 is needed, but before any mere human can find insight 1, all the researchers need to have already known insight 2, and to have any hope of finding insight 2, they all would have had to know insight 3, and so on.
I don’t agree that the probability of alignment research succeeding is that low. 17 years or 22 years of trying and failing is strong evidence against it being easy, but doesn’t prove that it is so hard that increasing alignment research is useless.
People worked on capabilities for decades, and never got anywhere until recently, when the hardware caught up, and it was discovered that scaling works unexpectedly well.
There is a chance that alignment research now might be more useful than alignment research earlier, though there is uncertainty in everything.
We should have uncertainty in the Ten Levels of AI Alignment Difficulty.
The comparison
It’s unlikely that 22 years of alignment research is insufficient but 23 years of alignment research is sufficient.
But what’s even more unlikely, is the chance that $200 billion on capabilities research plus $0.1 billion on alignment research is survivable, while $210 billion on capabilities research plus $1 billion on alignment research is deadly.
In the same way adding a little alignment research is unlikely to turn failure into success, adding a little capabilities research is unlikely to turn success into failure.
It’s also unlikely that alignment effort is even deadlier than capabilities effort dollar for dollar. That would mean reallocating alignment effort into capabilities effort paradoxically slows down capabilities and saves everyone.
Even if you are right
Even if you are right that delaying AI capabilities is all that matters, Anthropic still might be a good thing.
Even if Anthropic disappeared, or never existed in the first place, the AI investors will continue to pay money for research, and the AI researchers will continue to do research for money. Anthropic was just the middleman.
If Anthropic never existed, the middlemen would consist of only OpenAI, DeepMind, Meta AI, and other labs. These labs will not only act as the middle man, but lobby against regulation far more aggressively than Anthropic, and may discredit the entire “AI Notkilleveryoneism” movement.
To continue existing at one of these middlemen, you cannot simply stop paying the AI researchers for capabilities research, otherwise the AI investors and AI customers will stop paying you in turn. You cannot stem the flow, you can only decide how much goes through you.
It’s the old capitalist dilemma of “doing evil or getting out-competed by those who do.”
For their part, Anthropic redirected some of that flow to alignment research, and took the small amount of precautions which they could afford to take. They were also less willing to publish capabilities research than other labs. That may be the best one can hope to accomplish against this unstoppable flow from the AI investors to AI researchers.
The small amount of precautions which Anthropic did take may have already costed them their first mover advantage. Had Anthropic raced ahead before OpenAI released ChatGPT, Anthropic may have stolen the limelight, got the early customers and investors, and been bigger than OpenAI.
This assumes that alignment success is the mostly likely avenue to safety for humankind whereas like I said, I consider other avenues more likely. Actually there needs to be a qualifier on that: I consider other avenues more likely than the alignment project’s succeeding while the current generation of AI researchers remain free to push capabilities: if the AI capabilities juggernaut could be stopped for 150 years, giving the human population time to get smarter and wiser, then alignment is likely (say p = .7) to succeed in my estimation. I am informed by Eliezer in his latest interview that such a success would probably use some technology other than deep learning to create the AI’s capabilities; i.e., deep learning is particularly hard to align.
Central to my thinking is my belief that alignment is just a significantly harder problem than the problem of creating an AI capable of killing us all. Does any of the reasoning you do in your section “the comparision” change if you started believing that alignment is much much harder than creating a superhuman (unaligned) AI?
It will probably come as no great surprise that I am unmoved by the arguments I have seen (including your argument) that Anthropic is so much better than OpenAI that it helps the global situation for me to support Anthropic (if it were up to me, both would be shut down today if I couldn’t delegate the decision to someone else and if I had to decide now with the result that there is no time for me to gather more information) but I’m not very certain and would pay attention to future arguments for supporting Anthropic or some other lab.
Thanks for engaging with my comments.
Thank you, I’ve always been curious about this point of view because a lot of people have a similar view to yours.
I do think that alignment success is the most likely avenue, but my argument doesn’t require this assumption.
Your view isn’t just that “alternative paths are more likely to succeed than alignment,” but that “alternative paths are so much more likely to succeed than alignment, that the marginal capabilities increase caused by alignment research (or at least Anthropic), makes them unworthwhile.”
To believe that alignment is that hopeless, there should be stronger proof than “we tried it for 22 years, and the prior probability of the threshold being between 22 years and 23 years is low.” That argument can easily be turned around to argue why more alignment research is equally unlikely to cause harm (and why Anthropic is unlikely to cause harm). I also think multiplying funding can multiply progress (e.g. 4x funding ≈ 2x duration).
If you really want a singleton controlling the whole world (which I don’t agree with), your most plausible path would be for most people to see AI risk as a “desperate” problem, and for governments under desperation to agree on a worldwide military which swears to preserve civilian power structures within each country.[1]
Otherwise, the fact that no country took over the world during the last centuries strongly suggests that no country will in the next few years, and this feels more solid than your argument that “no one figured out alignment in the last 22 years, so no one will in the next few years.”
Out of curiosity, would you agree with this being the most plausible path, even if you disagree with the rest of my argument?
The most plausible story I can imagine quickly right now is the US and China fight a war and the US wins and uses some of the political capital from that win to slow down the AI project, perhaps through control over the world’s leading-edge semiconductor fabs plus pressuring Beijing to ban teaching and publishing about deep learning (to go with a ban on the same things in the West). I believe that basically all the leading-edge fabs in existence or that will be built in the next 10 years are in the countries the US has a lot of influence over or in China. Another story: the technology for “measuring loyalty in humans” gets really good fast, giving the first group to adopt the technology so great an advantage that over a few years the group gets control over the territories where all the world’s leading-edge fabs and most of the trained AI researchers are.
I want to remind people of the context of this conversation: I’m trying to persuade people to refrain from actions that on expectation make human extinction arrive a little quicker because most of our (sadly slim) hope for survival IMHO flows from possibilities other than our solving (super-)alignment in time.
I would go one step further and argue you don’t need to take over territory to shut down the semiconductor supply chain, if enough large countries believed AI risk was a desperate problem they could convince and negotiate the shutdown of the supply chain.
Shutting down the supply chain (and thus all leading-edge semiconductor fabs) could slow the AI project by a long time, but probably not “150 years” since the uncooperative countries will eventually build their own supply chain and fabs.
The ruling coalition can disincentivize the development of a semiconductor supply chain outside the territories it controls by selling world-wide semiconductors that use “verified boot” technology to make it really hard to use the semiconductor to run AI workloads similar to how it is really hard even for the best jailbreakers to jailbreak a modern iPhone.
That’s a good idea! Even today it may be useful for export controls (depending on how reliable it can be made).
The most powerful chips might be banned from export, and have “verified boot” technology inside in case they are smuggled out.
The second most powerful chips might be only exported to trusted countries, and also have this verified boot technology in case these trusted countries end up selling them to less trusted countries who sell them yet again.
If I believed that, then maybe I’d believe (like you seem to do) that there is no strong reason to believe that alignment project cannot be finished successfully before the capabilities project creates an unaligned super-human AI. I’m not saying scaling and hardware improvement have not been important: I’m saying they were not sufficient: algorithmic improvements were quite necessary for the field to arrive at anything like ChatGPT, and at least as early as 2006, there were algorithm improvements that almost everyone in the machine-learning field recognized as breakthrough or important insights. (Someone more knowledgeable about the topic might be able to push the date back into the 1990s or earlier.)
After the publication 19 years ago by Hinton et al of “A Fast Learning Algorithm for Deep Belief Nets”, basically all AI researchers recognized it as a breakthrough. Building on it, was AlexNet in 2012, again recognized as an important breakthrough by essentially everyone in the field (and if some people missed it then certainly generational adversarial networks, ResNets and AlphaGo convinced them). AlexNet was the first deep model trained on GPUs, a technique essential for the major breakthrough in 2017 reported in the paper “Attention is all you need”.
In contrast, we’ve seen nothing yet in the field of alignment that is as unambiguously a breakthrough as is the 2006 paper by Hinton et al or 2012′s AlexNet or (emphatically) the 2017 paper “Attention is all you need”. In fact I suspect that some researchers could tell that the attention mechanism reported by Bahdanau et al in 2015 or the Seq2Seq models reported on by Sutskever et al in 2014 was evidence that deep-learning language models were making solid progress and that a blockbuster insight like “attention is all you need” is probably only a few years away.
The reason I believe it is very unlikely for the alignment research project to succeed before AI kills us all is that in machine learning or the deep-learning subfield of machine learning, what was recognized by essentially everyone in the field as a minor or major breakthrough has occurred every few years. Many of these breakthrough rely on earlier breakthroughs (i.e., it is very unlikely for the sucessive breakthrough to have occurred if the earlier breakthrough had not been disseminated to the community of researcher). During this time, despite very talented people working on it, there has been zero results in alignment research that the entire field of alignment researchers would consider a breakthrough. That does not mean it is impossible for the alignment project to be finished in time, but it does IMO make it critical for the alignment project to be prosecuted in such a way that it does not inadvertently assist the capabilities project.
Yes, much more money has been spent on capability research the last 20 years than on alignment research, but money doesn’t help all that much to speed up research in which to have any hope of solving the problem, the researchers need insight X or X2, and to have any hope of arriving at insight X, they need insights Y and Y2, and to have much hope at all of arriving at Y, they need insight Z.
Even if building intelligence requires solving many many problems, preventing that intelligence from killing you may just require solving a single very hard problem. We may go from having no idea to having a very good idea.
I don’t know. My view is that we can’t be sure of these things.