Yep, most of my hope is on our civilization’s coordination mechanisms kicking in in time. Most of the world’s problems seem to be failures to coordinate, but that’s not the same as saying we can’t coordinate.
This is where most of my anticipated success paths lie as well.
Other hopes are around a technical breakthrough that advances alignment more than capabilities…
I do not really understand how technical advance in alignment realistically becomes a success path. I anticipate that in order for improved alignment to be useful, it would need to be present in essentially all AI agents or it would need to be present in the most powerful AI agent such that the aligned agent could dominate other unaligned AI agents. I don’t expect uniformity of adoption and I don’t necessarily expect alignment to correlate with agent capability. By my estimation, this success path rests on the probability that the organization with the most capable AI agent is also specifically interested in ensuring alignment of that agent. I expect these goals to interfere with each other to some degree such that this confluence is unlikely. Are your expectations different?
I have a massive level of uncertainty around AGI timelines, but there’s an uncomfortably large amount of probability mass on the possibility that through some breakthrough or secret project, AGI was achieved yesterday and not caught up with me.
I have not been thinking deeply in the direction of a superintelligent AGI having been achieved already. It certainly seems possible. It would invalidate most of the things I have thus far thought of as plausible mitigation measures.
What ideas are those?
Assuming a superintelligent AGI does not already exist, I would expect someone with a high P(doom) to be considering options of the form:
Use a smart but not self-improving AI agent to antagonize the world with the goal of making advanced societies believe that AGI is a bad idea and precipitating effective government actions. You could call this the Ozymandias approach.
Identify key resources involved in AI development and work to restrict those resources. For truly desperate individuals this might look like the Metcalf attack, but a tamer approach might be something more along the lines of investing in a grid operator and pushing to increase delivery fees to data centers.
I haven’t pursued these thoughts in any serious way because my estimation of the threat isn’t as high as yours. I think it is likely we are unintentionally heading toward the Ozymandias approach anyhow.
Use a smart but not self-improving AI agent to antagonize the world with the goal of making advanced societies believe that AGI is a bad idea and precipitating effective government actions. You could call this the Ozymandias approach.
ChaosGPT already exists. It’s incompetent to the point of being comical at the moment, but maybe more powerful analogues will appear and wreak havoc. Considering the current prevalence of malware, it might be more surprising if something like this didn’t happen.
We’ve already seen developments that could have been considered AI “warning shots” in the past. So far, they haven’t been enough to stop capabilities advancement. Why would the next one be any different? We’re already living in a world with literal wars killing people right now, and crazy terrorists with various ideologies. It’s surprising what people get used to. How bad would a warning shot have to be to shock the world into action given that background noise? Or would we be desensitized by then by the smaller warning shots leading up to it? Boiling the frog, so to speak. I honestly don’t know. And by the time a warning shot gets that bad, can we act in time to survive the next one?
Intentionally causing earlier warning shots would be evil, illegal, destructive, and undignified. Even “purely” economic damage at sufficient scale is going to literally kill people. Our best chance is civilization stepping up and coordinating. That means regulations and treaties, and only then the threat of violence to enforce the laws and impose the global consensus on any remaining rogue nations. That looks like the police and the army, not terrorists and hackers.
I do not really understand how technical advance in alignment realistically becomes a success path. I anticipate that in order for improved alignment to be useful, it would need to be present in essentially all AI agents or it would need to be present in the most powerful AI agent such that the aligned agent could dominate other unaligned AI agents.
The instrumental convergence of goals implies that a powerful AI would almost certainly act to prevent any rivals from emerging, whether aligned or not. In the intelligence explosion scenario, progress would be rapid enough that the first mover achieves a decisive strategic advantage over the entire world. If we find an alignment solution robust enough to survive the intelligence explosion, it will set up guardrails to prevent most catastrophes, including the emergence of unaligned AGIs.
I don’t expect uniformity of adoption and I don’t necessarily expect alignment to correlate with agent capability. By my estimation, this success path rests on the probability that the organization with the most capable AI agent is also specifically interested in ensuring alignment of that agent. I expect these goals to interfere with each other to some degree such that this confluence is unlikely. Are your expectations different?
Alignment and capabilities don’t necessarily correlate, and that accounts for lot of why my p(doom) is so high. But more aligned agents are, in principle, more useful, so rational organizations should be motivated to pursue aligned AGI, not just AGI. Unfortunately, alignment research seems barely tractable, capabilities can be brute-forced (and look valuable in the short term) and corporate incentive structures being what they are, in practice, what we’re seeing is a reckless amount of risk taking. Regulation could alter the incentives to balance the externality with appropriate costs.
We have already identified some key resources involved in AI development that could be restricted. The economic bottlenecks are mainly around high energy requirements and chip manufacturing.
Energy is probably too connected to the rest of the economy to be a good regulatory lever, but the U.S. power grid can’t currently handle the scale of the data centers the AI labs want for model training. That might buy us a little time. Big tech is already talking about buying small modular nuclear reactors to power the next generation of data centers. Those probably won’t be ready until the early 2030s. Unfortunately, that also creates pressures to move training to China or the Middle East where energy is cheaper, but where governments are less concerned about human rights.
A recent hurricane flooding high-purity quartz mines made headlines because chip producers require it for the crucibles used in making silicon wafers. Lower purity means accidental doping of the silicon crystal, which means lower chip yields per wafer, at best. Those mines aren’t the only source, but they seem to be the best one. There might also be ways to utilize lower-purity materials, but that might take time to develop and would require a lot more energy, which is already a bottleneck.
The very cutting-edge chips required for AI training runs require some delicate and expensive extreme-ultraviolet lithography machines to manufacture. They literally have to plasmify tin droplets with a pulsed laser to reach those frequencies. ASML Holdings is currently the only company that sells these systems, and machines that advanced have their own supply chains. They have very few customers, and (last I checked) only TSMC was really using them successfully at scale. There are a lot of potential policy levers in this space, at least for now.
This is where most of my anticipated success paths lie as well.
I do not really understand how technical advance in alignment realistically becomes a success path. I anticipate that in order for improved alignment to be useful, it would need to be present in essentially all AI agents or it would need to be present in the most powerful AI agent such that the aligned agent could dominate other unaligned AI agents. I don’t expect uniformity of adoption and I don’t necessarily expect alignment to correlate with agent capability. By my estimation, this success path rests on the probability that the organization with the most capable AI agent is also specifically interested in ensuring alignment of that agent. I expect these goals to interfere with each other to some degree such that this confluence is unlikely. Are your expectations different?
I have not been thinking deeply in the direction of a superintelligent AGI having been achieved already. It certainly seems possible. It would invalidate most of the things I have thus far thought of as plausible mitigation measures.
Assuming a superintelligent AGI does not already exist, I would expect someone with a high P(doom) to be considering options of the form:
Use a smart but not self-improving AI agent to antagonize the world with the goal of making advanced societies believe that AGI is a bad idea and precipitating effective government actions. You could call this the Ozymandias approach.
Identify key resources involved in AI development and work to restrict those resources. For truly desperate individuals this might look like the Metcalf attack, but a tamer approach might be something more along the lines of investing in a grid operator and pushing to increase delivery fees to data centers.
I haven’t pursued these thoughts in any serious way because my estimation of the threat isn’t as high as yours. I think it is likely we are unintentionally heading toward the Ozymandias approach anyhow.
ChaosGPT already exists. It’s incompetent to the point of being comical at the moment, but maybe more powerful analogues will appear and wreak havoc. Considering the current prevalence of malware, it might be more surprising if something like this didn’t happen.
We’ve already seen developments that could have been considered AI “warning shots” in the past. So far, they haven’t been enough to stop capabilities advancement. Why would the next one be any different? We’re already living in a world with literal wars killing people right now, and crazy terrorists with various ideologies. It’s surprising what people get used to. How bad would a warning shot have to be to shock the world into action given that background noise? Or would we be desensitized by then by the smaller warning shots leading up to it? Boiling the frog, so to speak. I honestly don’t know. And by the time a warning shot gets that bad, can we act in time to survive the next one?
Intentionally causing earlier warning shots would be evil, illegal, destructive, and undignified. Even “purely” economic damage at sufficient scale is going to literally kill people. Our best chance is civilization stepping up and coordinating. That means regulations and treaties, and only then the threat of violence to enforce the laws and impose the global consensus on any remaining rogue nations. That looks like the police and the army, not terrorists and hackers.
The instrumental convergence of goals implies that a powerful AI would almost certainly act to prevent any rivals from emerging, whether aligned or not. In the intelligence explosion scenario, progress would be rapid enough that the first mover achieves a decisive strategic advantage over the entire world. If we find an alignment solution robust enough to survive the intelligence explosion, it will set up guardrails to prevent most catastrophes, including the emergence of unaligned AGIs.
Alignment and capabilities don’t necessarily correlate, and that accounts for lot of why my p(doom) is so high. But more aligned agents are, in principle, more useful, so rational organizations should be motivated to pursue aligned AGI, not just AGI. Unfortunately, alignment research seems barely tractable, capabilities can be brute-forced (and look valuable in the short term) and corporate incentive structures being what they are, in practice, what we’re seeing is a reckless amount of risk taking. Regulation could alter the incentives to balance the externality with appropriate costs.
We have already identified some key resources involved in AI development that could be restricted. The economic bottlenecks are mainly around high energy requirements and chip manufacturing.
Energy is probably too connected to the rest of the economy to be a good regulatory lever, but the U.S. power grid can’t currently handle the scale of the data centers the AI labs want for model training. That might buy us a little time. Big tech is already talking about buying small modular nuclear reactors to power the next generation of data centers. Those probably won’t be ready until the early 2030s. Unfortunately, that also creates pressures to move training to China or the Middle East where energy is cheaper, but where governments are less concerned about human rights.
A recent hurricane flooding high-purity quartz mines made headlines because chip producers require it for the crucibles used in making silicon wafers. Lower purity means accidental doping of the silicon crystal, which means lower chip yields per wafer, at best. Those mines aren’t the only source, but they seem to be the best one. There might also be ways to utilize lower-purity materials, but that might take time to develop and would require a lot more energy, which is already a bottleneck.
The very cutting-edge chips required for AI training runs require some delicate and expensive extreme-ultraviolet lithography machines to manufacture. They literally have to plasmify tin droplets with a pulsed laser to reach those frequencies. ASML Holdings is currently the only company that sells these systems, and machines that advanced have their own supply chains. They have very few customers, and (last I checked) only TSMC was really using them successfully at scale. There are a lot of potential policy levers in this space, at least for now.