My current estimate of P(doom) in the next 15 years is 5%. That is, high enough to be concerned , but not high enough to cash out my retirement. I am curious about anyone harboring a P(doom) > 50%. This would seem to be high enough to support drastic actions. What work has been done to develop rational approaches to such a high P(doom)?
I mean, what do you think we’ve been doing all along?
I’m at like 90% in 20 years, but I’m not claiming even one significant digit on that figure. My drastic actions have been to get depressed enough to be unwilling to work in a job as stressful as my last one. I don’t want to be that miserable if we’ve only got a few years left. I don’t think I’m being sufficiently rational about it, no. It would be more dignified to make lots of money and donate it to the organization with the best chance of stopping or at least delaying our impending doom. I couldn’t tell you which one that is at the moment though.
Some are starting to take more drastic actions. Whether those actions will be effective remains to be seen.
In my view, technical alignment is not keeping up with capabilities advancement. We have no alignment tech robust enough to even possibly survive the likely intelligence explosion scenario, and it’s not likely to be developed in time. Corporate incentive structure and dysfunction makes them insufficiently cautious. Even without an intelligence explosion, we also have no plans for the likely social upheaval from rapid job loss. The default outcome is that human life becomes worthless, because that’s already the case in such economies.
Our best chance at this point is probably government intervention to put the liability back on reckless AI labs for the risks they’re imposing on the rest of us, if not an outright moratorium on massive training runs.
I mean, what do you think we’ve been doing all along?
So, the short answer is that I am actually just ignorant about this. I’m reading here to learn more but I certainly haven’t ingested a sufficient history of relevant works. I’m happy to prioritize any recommendations that others have found insightful or thought provoking, especially from the point of view of a novice.
I can answer the specific question “what do I think” in a bit more detail. The answer should be understood to represent the viewpoint of someone who is new to the discussion and has only been exposed to an algorithmically influenced, self-selected slice of the information.
I watched the Lex Fridman interview of Eliezer Yudkowsky and around 3:06 Lex asks about what advice Eliezer would give to young people. Eliezer’s initial answer is something to the extent of “Don’t expect a long future.” I interpreted Eliezer’s answer largely as trying to evoke a sense of reverence for the seriousness of the problem. When pushed on the question a bit further, Eliezer’s given answer is “…I hardly know how to fight myself at this point.” I interpreted this to mean that the space of possible actions that is being searched appears intractable from the perspective of a dedicated researcher. This, I believe, is largely the source of my question. Current approaches appear to be losing the race, so what other avenues are being explored?
I read the “Thomas Kwa’s MIRI research experience” discussion and there was a statement to the effect that MIRI does not want Nate’s mindset to be known to frontier AI labs. I interpreted this to mean that the most likely course being explored at MIRI is to build a good AI to preempt or stop a bad AI. This strikes me as plausible because my intuition is that the LLM architectures being employed are largely inefficient for developing AGI. However, the compute scaling seems to work well enough that it may win the race before other competing ideas come to fruition.
An example of an alternative approach that I read was “Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible” which seems like an avenue worth exploring, but well outside of my areas of expertise. The approach shares a characteristic with my inference of MIRI’s approach in that both appear to be pursuing highly technical avenues which would not scale meaningfully at this stage by adding helpers from the general public.
The forms of approaches that I expected to see but haven’t seen too much of thus far are those similar to the one that you linked about STOP AI. That is, approaches that would scale with the addition of approximately average people. I expected that this type of approach might take the form of disrupting model training by various means or coopting the organizations involved with an aim toward redirection or delay. My lack of exposure to such information supports a few competing models: (1) drastic actions aren’t being pursued at large scales, (2) actions are being pursued covertly, or (3) I am focusing my attention in the wrong places.
Our best chance at this point is probably government intervention to put the liability back on reckless AI labs for the risks they’re imposing on the rest of us, if not an outright moratorium on massive training runs.
Government action strikes me as a very reasonable approach for people estimating long time scales or relatively lower probabilities. However, it seems to be a less reasonable approach if time scales are short or probabilities are high. I presume that your high P(doom) already accounts for your estimation of the probability of government action being successful. Does your high P(doom) imply that you expect these to be too slow, or too ineffective? I interpret a high P(doom) as meaning that the current set of actions that you have thought of are unlikely to be successful and therefore additional action exploration is necessary. I would expect this would include the admission of ideas which would have previously been pruned because they come with negative consequences.
The forms of approaches that I expected to see but haven’t seen too much of thus far are those similar to the one that you linked about STOP AI. That is, approaches that would scale with the addition of approximately average people.
Besides STOP AI, there’s also the less extreme PauseAI. They’re interested in things like lobbying, protests, lawsuits, etc.
I presume that your high P(doom) already accounts for your estimation of the probability of government action being successful. Does your high P(doom) imply that you expect these to be too slow, or too ineffective?
Yep, most of my hope is on our civilization’s coordination mechanisms kicking in in time. Most of the world’s problems seem to be failures to coordinate, but that’s not the same as saying we can’t coordinate. Failures are more salient, but that’s a cognitive bias. We’ve achieved a remarkable level of stability, in the light of recent history. But rationalists can see more clearly than most just how mad the world still is. Most of the public and most of our leaders fail to grasp some of the very basics of epistemology.
We used to think the public wouldn’t get it (because most people are insufficiently sane), but they actually seem appropriately suspicious of AI. We used to think a technical solution was our only realistic option, but progress there has not kept up with more powerful computers brute-forcing AI. In desperation, we asked for more time. We were pleasantly surprised at how well the message was received, but it doesn’t look like the slowdown is actually happening yet.
As a software engineer, I’ve worked in tech companies. Relatively big ones, even. I’ve seen the pressures and dysfunction. I strongly suspected that they’re not taking safety and security seriously enough to actually make a difference, and reports from insiders only confirm that narrative. If those are the institutions calling the shots when we achieve AGI, we’re dead. We desperately need more regulation to force them to behave or stop. I fear that what regulations we do get won’t be enough, but they might.
Other hopes are around a technical breakthrough that advances alignment more than capabilities, or the AI labs somehow failing in their project to produce AGI (despite the considerable resources they’ve already amassed), perhaps due to a breakdown in the scaling laws or some unrelated disaster that makes the projects too expensive to continue.
However, it seems to be a less reasonable approach if time scales are short or probabilities are high.
I have a massive level of uncertainty around AGI timelines, but there’s an uncomfortably large amount of probability mass on the possibility that through some breakthrough or secret project, AGI was achieved yesterday and not caught up with me. We’re out of buffer. But we might still have decades before things get bad. We might be able to coordinate in time, with government intervention.
I would expect this would include the admission of ideas which would have previously been pruned because they come with negative consequences.
Yep, most of my hope is on our civilization’s coordination mechanisms kicking in in time. Most of the world’s problems seem to be failures to coordinate, but that’s not the same as saying we can’t coordinate.
This is where most of my anticipated success paths lie as well.
Other hopes are around a technical breakthrough that advances alignment more than capabilities…
I do not really understand how technical advance in alignment realistically becomes a success path. I anticipate that in order for improved alignment to be useful, it would need to be present in essentially all AI agents or it would need to be present in the most powerful AI agent such that the aligned agent could dominate other unaligned AI agents. I don’t expect uniformity of adoption and I don’t necessarily expect alignment to correlate with agent capability. By my estimation, this success path rests on the probability that the organization with the most capable AI agent is also specifically interested in ensuring alignment of that agent. I expect these goals to interfere with each other to some degree such that this confluence is unlikely. Are your expectations different?
I have a massive level of uncertainty around AGI timelines, but there’s an uncomfortably large amount of probability mass on the possibility that through some breakthrough or secret project, AGI was achieved yesterday and not caught up with me.
I have not been thinking deeply in the direction of a superintelligent AGI having been achieved already. It certainly seems possible. It would invalidate most of the things I have thus far thought of as plausible mitigation measures.
What ideas are those?
Assuming a superintelligent AGI does not already exist, I would expect someone with a high P(doom) to be considering options of the form:
Use a smart but not self-improving AI agent to antagonize the world with the goal of making advanced societies believe that AGI is a bad idea and precipitating effective government actions. You could call this the Ozymandias approach.
Identify key resources involved in AI development and work to restrict those resources. For truly desperate individuals this might look like the Metcalf attack, but a tamer approach might be something more along the lines of investing in a grid operator and pushing to increase delivery fees to data centers.
I haven’t pursued these thoughts in any serious way because my estimation of the threat isn’t as high as yours. I think it is likely we are unintentionally heading toward the Ozymandias approach anyhow.
Use a smart but not self-improving AI agent to antagonize the world with the goal of making advanced societies believe that AGI is a bad idea and precipitating effective government actions. You could call this the Ozymandias approach.
ChaosGPT already exists. It’s incompetent to the point of being comical at the moment, but maybe more powerful analogues will appear and wreak havoc. Considering the current prevalence of malware, it might be more surprising if something like this didn’t happen.
We’ve already seen developments that could have been considered AI “warning shots” in the past. So far, they haven’t been enough to stop capabilities advancement. Why would the next one be any different? We’re already living in a world with literal wars killing people right now, and crazy terrorists with various ideologies. It’s surprising what people get used to. How bad would a warning shot have to be to shock the world into action given that background noise? Or would we be desensitized by then by the smaller warning shots leading up to it? Boiling the frog, so to speak. I honestly don’t know. And by the time a warning shot gets that bad, can we act in time to survive the next one?
Intentionally causing earlier warning shots would be evil, illegal, destructive, and undignified. Even “purely” economic damage at sufficient scale is going to literally kill people. Our best chance is civilization stepping up and coordinating. That means regulations and treaties, and only then the threat of violence to enforce the laws and impose the global consensus on any remaining rogue nations. That looks like the police and the army, not terrorists and hackers.
I do not really understand how technical advance in alignment realistically becomes a success path. I anticipate that in order for improved alignment to be useful, it would need to be present in essentially all AI agents or it would need to be present in the most powerful AI agent such that the aligned agent could dominate other unaligned AI agents.
The instrumental convergence of goals implies that a powerful AI would almost certainly act to prevent any rivals from emerging, whether aligned or not. In the intelligence explosion scenario, progress would be rapid enough that the first mover achieves a decisive strategic advantage over the entire world. If we find an alignment solution robust enough to survive the intelligence explosion, it will set up guardrails to prevent most catastrophes, including the emergence of unaligned AGIs.
I don’t expect uniformity of adoption and I don’t necessarily expect alignment to correlate with agent capability. By my estimation, this success path rests on the probability that the organization with the most capable AI agent is also specifically interested in ensuring alignment of that agent. I expect these goals to interfere with each other to some degree such that this confluence is unlikely. Are your expectations different?
Alignment and capabilities don’t necessarily correlate, and that accounts for lot of why my p(doom) is so high. But more aligned agents are, in principle, more useful, so rational organizations should be motivated to pursue aligned AGI, not just AGI. Unfortunately, alignment research seems barely tractable, capabilities can be brute-forced (and look valuable in the short term) and corporate incentive structures being what they are, in practice, what we’re seeing is a reckless amount of risk taking. Regulation could alter the incentives to balance the externality with appropriate costs.
We have already identified some key resources involved in AI development that could be restricted. The economic bottlenecks are mainly around high energy requirements and chip manufacturing.
Energy is probably too connected to the rest of the economy to be a good regulatory lever, but the U.S. power grid can’t currently handle the scale of the data centers the AI labs want for model training. That might buy us a little time. Big tech is already talking about buying small modular nuclear reactors to power the next generation of data centers. Those probably won’t be ready until the early 2030s. Unfortunately, that also creates pressures to move training to China or the Middle East where energy is cheaper, but where governments are less concerned about human rights.
A recent hurricane flooding high-purity quartz mines made headlines because chip producers require it for the crucibles used in making silicon wafers. Lower purity means accidental doping of the silicon crystal, which means lower chip yields per wafer, at best. Those mines aren’t the only source, but they seem to be the best one. There might also be ways to utilize lower-purity materials, but that might take time to develop and would require a lot more energy, which is already a bottleneck.
The very cutting-edge chips required for AI training runs require some delicate and expensive extreme-ultraviolet lithography machines to manufacture. They literally have to plasmify tin droplets with a pulsed laser to reach those frequencies. ASML Holdings is currently the only company that sells these systems, and machines that advanced have their own supply chains. They have very few customers, and (last I checked) only TSMC was really using them successfully at scale. There are a lot of potential policy levers in this space, at least for now.
I mean, what do you think we’ve been doing all along?
I’m at like 90% in 20 years, but I’m not claiming even one significant digit on that figure. My drastic actions have been to get depressed enough to be unwilling to work in a job as stressful as my last one. I don’t want to be that miserable if we’ve only got a few years left. I don’t think I’m being sufficiently rational about it, no. It would be more dignified to make lots of money and donate it to the organization with the best chance of stopping or at least delaying our impending doom. I couldn’t tell you which one that is at the moment though.
Some are starting to take more drastic actions. Whether those actions will be effective remains to be seen.
In my view, technical alignment is not keeping up with capabilities advancement. We have no alignment tech robust enough to even possibly survive the likely intelligence explosion scenario, and it’s not likely to be developed in time. Corporate incentive structure and dysfunction makes them insufficiently cautious. Even without an intelligence explosion, we also have no plans for the likely social upheaval from rapid job loss. The default outcome is that human life becomes worthless, because that’s already the case in such economies.
Our best chance at this point is probably government intervention to put the liability back on reckless AI labs for the risks they’re imposing on the rest of us, if not an outright moratorium on massive training runs.
Gladstone has an Action Plan. There’s also https://www.narrowpath.co/.
So, the short answer is that I am actually just ignorant about this. I’m reading here to learn more but I certainly haven’t ingested a sufficient history of relevant works. I’m happy to prioritize any recommendations that others have found insightful or thought provoking, especially from the point of view of a novice.
I can answer the specific question “what do I think” in a bit more detail. The answer should be understood to represent the viewpoint of someone who is new to the discussion and has only been exposed to an algorithmically influenced, self-selected slice of the information.
I watched the Lex Fridman interview of Eliezer Yudkowsky and around 3:06 Lex asks about what advice Eliezer would give to young people. Eliezer’s initial answer is something to the extent of “Don’t expect a long future.” I interpreted Eliezer’s answer largely as trying to evoke a sense of reverence for the seriousness of the problem. When pushed on the question a bit further, Eliezer’s given answer is “…I hardly know how to fight myself at this point.” I interpreted this to mean that the space of possible actions that is being searched appears intractable from the perspective of a dedicated researcher. This, I believe, is largely the source of my question. Current approaches appear to be losing the race, so what other avenues are being explored?
I read the “Thomas Kwa’s MIRI research experience” discussion and there was a statement to the effect that MIRI does not want Nate’s mindset to be known to frontier AI labs. I interpreted this to mean that the most likely course being explored at MIRI is to build a good AI to preempt or stop a bad AI. This strikes me as plausible because my intuition is that the LLM architectures being employed are largely inefficient for developing AGI. However, the compute scaling seems to work well enough that it may win the race before other competing ideas come to fruition.
An example of an alternative approach that I read was “Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible” which seems like an avenue worth exploring, but well outside of my areas of expertise. The approach shares a characteristic with my inference of MIRI’s approach in that both appear to be pursuing highly technical avenues which would not scale meaningfully at this stage by adding helpers from the general public.
The forms of approaches that I expected to see but haven’t seen too much of thus far are those similar to the one that you linked about STOP AI. That is, approaches that would scale with the addition of approximately average people. I expected that this type of approach might take the form of disrupting model training by various means or coopting the organizations involved with an aim toward redirection or delay. My lack of exposure to such information supports a few competing models: (1) drastic actions aren’t being pursued at large scales, (2) actions are being pursued covertly, or (3) I am focusing my attention in the wrong places.
Government action strikes me as a very reasonable approach for people estimating long time scales or relatively lower probabilities. However, it seems to be a less reasonable approach if time scales are short or probabilities are high. I presume that your high P(doom) already accounts for your estimation of the probability of government action being successful. Does your high P(doom) imply that you expect these to be too slow, or too ineffective? I interpret a high P(doom) as meaning that the current set of actions that you have thought of are unlikely to be successful and therefore additional action exploration is necessary. I would expect this would include the admission of ideas which would have previously been pruned because they come with negative consequences.
Besides STOP AI, there’s also the less extreme PauseAI. They’re interested in things like lobbying, protests, lawsuits, etc.
Yep, most of my hope is on our civilization’s coordination mechanisms kicking in in time. Most of the world’s problems seem to be failures to coordinate, but that’s not the same as saying we can’t coordinate. Failures are more salient, but that’s a cognitive bias. We’ve achieved a remarkable level of stability, in the light of recent history. But rationalists can see more clearly than most just how mad the world still is. Most of the public and most of our leaders fail to grasp some of the very basics of epistemology.
We used to think the public wouldn’t get it (because most people are insufficiently sane), but they actually seem appropriately suspicious of AI. We used to think a technical solution was our only realistic option, but progress there has not kept up with more powerful computers brute-forcing AI. In desperation, we asked for more time. We were pleasantly surprised at how well the message was received, but it doesn’t look like the slowdown is actually happening yet.
As a software engineer, I’ve worked in tech companies. Relatively big ones, even. I’ve seen the pressures and dysfunction. I strongly suspected that they’re not taking safety and security seriously enough to actually make a difference, and reports from insiders only confirm that narrative. If those are the institutions calling the shots when we achieve AGI, we’re dead. We desperately need more regulation to force them to behave or stop. I fear that what regulations we do get won’t be enough, but they might.
Other hopes are around a technical breakthrough that advances alignment more than capabilities, or the AI labs somehow failing in their project to produce AGI (despite the considerable resources they’ve already amassed), perhaps due to a breakdown in the scaling laws or some unrelated disaster that makes the projects too expensive to continue.
I have a massive level of uncertainty around AGI timelines, but there’s an uncomfortably large amount of probability mass on the possibility that through some breakthrough or secret project, AGI was achieved yesterday and not caught up with me. We’re out of buffer. But we might still have decades before things get bad. We might be able to coordinate in time, with government intervention.
What ideas are those?
This is where most of my anticipated success paths lie as well.
I do not really understand how technical advance in alignment realistically becomes a success path. I anticipate that in order for improved alignment to be useful, it would need to be present in essentially all AI agents or it would need to be present in the most powerful AI agent such that the aligned agent could dominate other unaligned AI agents. I don’t expect uniformity of adoption and I don’t necessarily expect alignment to correlate with agent capability. By my estimation, this success path rests on the probability that the organization with the most capable AI agent is also specifically interested in ensuring alignment of that agent. I expect these goals to interfere with each other to some degree such that this confluence is unlikely. Are your expectations different?
I have not been thinking deeply in the direction of a superintelligent AGI having been achieved already. It certainly seems possible. It would invalidate most of the things I have thus far thought of as plausible mitigation measures.
Assuming a superintelligent AGI does not already exist, I would expect someone with a high P(doom) to be considering options of the form:
Use a smart but not self-improving AI agent to antagonize the world with the goal of making advanced societies believe that AGI is a bad idea and precipitating effective government actions. You could call this the Ozymandias approach.
Identify key resources involved in AI development and work to restrict those resources. For truly desperate individuals this might look like the Metcalf attack, but a tamer approach might be something more along the lines of investing in a grid operator and pushing to increase delivery fees to data centers.
I haven’t pursued these thoughts in any serious way because my estimation of the threat isn’t as high as yours. I think it is likely we are unintentionally heading toward the Ozymandias approach anyhow.
ChaosGPT already exists. It’s incompetent to the point of being comical at the moment, but maybe more powerful analogues will appear and wreak havoc. Considering the current prevalence of malware, it might be more surprising if something like this didn’t happen.
We’ve already seen developments that could have been considered AI “warning shots” in the past. So far, they haven’t been enough to stop capabilities advancement. Why would the next one be any different? We’re already living in a world with literal wars killing people right now, and crazy terrorists with various ideologies. It’s surprising what people get used to. How bad would a warning shot have to be to shock the world into action given that background noise? Or would we be desensitized by then by the smaller warning shots leading up to it? Boiling the frog, so to speak. I honestly don’t know. And by the time a warning shot gets that bad, can we act in time to survive the next one?
Intentionally causing earlier warning shots would be evil, illegal, destructive, and undignified. Even “purely” economic damage at sufficient scale is going to literally kill people. Our best chance is civilization stepping up and coordinating. That means regulations and treaties, and only then the threat of violence to enforce the laws and impose the global consensus on any remaining rogue nations. That looks like the police and the army, not terrorists and hackers.
The instrumental convergence of goals implies that a powerful AI would almost certainly act to prevent any rivals from emerging, whether aligned or not. In the intelligence explosion scenario, progress would be rapid enough that the first mover achieves a decisive strategic advantage over the entire world. If we find an alignment solution robust enough to survive the intelligence explosion, it will set up guardrails to prevent most catastrophes, including the emergence of unaligned AGIs.
Alignment and capabilities don’t necessarily correlate, and that accounts for lot of why my p(doom) is so high. But more aligned agents are, in principle, more useful, so rational organizations should be motivated to pursue aligned AGI, not just AGI. Unfortunately, alignment research seems barely tractable, capabilities can be brute-forced (and look valuable in the short term) and corporate incentive structures being what they are, in practice, what we’re seeing is a reckless amount of risk taking. Regulation could alter the incentives to balance the externality with appropriate costs.
We have already identified some key resources involved in AI development that could be restricted. The economic bottlenecks are mainly around high energy requirements and chip manufacturing.
Energy is probably too connected to the rest of the economy to be a good regulatory lever, but the U.S. power grid can’t currently handle the scale of the data centers the AI labs want for model training. That might buy us a little time. Big tech is already talking about buying small modular nuclear reactors to power the next generation of data centers. Those probably won’t be ready until the early 2030s. Unfortunately, that also creates pressures to move training to China or the Middle East where energy is cheaper, but where governments are less concerned about human rights.
A recent hurricane flooding high-purity quartz mines made headlines because chip producers require it for the crucibles used in making silicon wafers. Lower purity means accidental doping of the silicon crystal, which means lower chip yields per wafer, at best. Those mines aren’t the only source, but they seem to be the best one. There might also be ways to utilize lower-purity materials, but that might take time to develop and would require a lot more energy, which is already a bottleneck.
The very cutting-edge chips required for AI training runs require some delicate and expensive extreme-ultraviolet lithography machines to manufacture. They literally have to plasmify tin droplets with a pulsed laser to reach those frequencies. ASML Holdings is currently the only company that sells these systems, and machines that advanced have their own supply chains. They have very few customers, and (last I checked) only TSMC was really using them successfully at scale. There are a lot of potential policy levers in this space, at least for now.