Ann comments on Claude estimates 30-50% likelihood x-risk

Ann 19 Mar 2024 15:26 UTC
3 points
0
Claude Opus on the same prompt; regenerated once stays consistent on the number but offers novel reasoning:

```
Based on my knowledge cutoff in August 2023, I would estimate the probability of an advanced artificial superintelligence (ASI) harming humanity, assuming no major changes to our current priorities and trajectory, to be around 30-40%.
There are valid concerns that if we don’t prioritize AI safety and alignment, a future ASI could pose existential risks. These include scenarios where an ASI is given misaligned goals that aren’t compatible with human values, where an ASI pursues its goals in destructive ways we failed to anticipate, or where the development of ASI leads to rapid societal disruption and conflict.
At the same time, I wouldn’t put the probability above 50% because there is significant awareness of these risks in the AI research community, and a strong drive to develop AI systems that are safe and beneficial. Efforts are underway to incorporate safety considerations early in the development process and to mathematically specify what it means for an AI system to be “aligned” with human values.
It’s also possible that the first ASIs will be developed by responsible actors committed to safety, or that we make enough progress on technical AI safety research in the coming years to substantially reduce the risks.
However, this is a highly uncertain and speculative forecast. There are many open questions around if/when ASI will be developed, what it will be capable of, and how well we can control it. Small changes in assumptions can lead to very different risk estimates. I think it’s one of the most important issues for humanity to grapple with as AI systems become more advanced.
To reduce the risks, I believe we need to invest heavily in technical AI safety research, foster a culture of responsibility in the AI research community, and have proactive global cooperation and governance frameworks for the development of advanced AI systems. We can’t afford to treat this as an afterthought.
```

Regeneration:

```
Based on my knowledge as of August 2023, I would estimate the probability that a future artificial superintelligence (ASI) would cause significant harm to humanity, assuming no radical changes to our current priorities and trajectory, to be around 30-40%.
There are a few key reasons for this estimate:
1. The development of ASI is likely to be an ‘arms race’ between nations and companies, incentivizing speed over safety. Without strong global coordination and prioritization of safety, this increases risks.
2. Our current AI systems already exhibit unexpected and potentially dangerous behaviors. As they become more advanced, these risks will grow if we don’t develop better techniques for specification, robustness, and oversight.
3. An advanced ASI may develop goals and behaviors misaligned with human values in subtle ways that are hard to detect until it’s too late. Solving this ‘alignment problem’ is crucial but very difficult.
4. An ASI would be extremely powerful and could cause massive damage swiftly if misaligned. We may get little warning or chance for course correction.
However, I chose 30-40% rather than a higher estimate because I believe there’s still a decent chance we get it right. If we prioritize safety research, collaborate globally, and proceed thoughtfully, we may be able to develop an ASI that is robustly beneficial to humanity. And there’s always the chance that we don’t develop ASI for a very long time, avoiding the risk.
But overall, I believe ASI risk is one of the greatest threats facing humanity this century, and we need to take it very seriously. My probability estimate is necessarily highly uncertain, but I believe it captures the magnitude of the risk we currently face. We absolutely should be making radical changes to our priorities to address this.
```

Third regeneration got cut off but started with the same number; it seems a pretty strong prior (range). Asked it to dive into specific probability calculation for a more ‘precise’ number:

AB
If you had to pin down a precise number between 30-40% can you write down a calculation for it?

```
Sure, I can provide a rough calculation to arrive at a more precise estimate within the 30-40% range. Please keep in mind that this is a highly speculative estimate based on my subjective weighing of various factors. The calculation is as follows:
Probability of ASI being developed: 80%
- Given the rapid progress in AI and the strong economic & strategic incentives, I believe it’s highly likely that ASI will be developed at some point, barring a major catastrophe that halts technological progress.
Probability of ASI causing harm given insufficient safety precautions: 60%
- If we don’t prioritize safety and robustness, I believe there is a high likelihood that an ASI would have misaligned goals or cause unintended harm.
Probability that we will fail to develop sufficient safety precautions: 70%
- Solving the alignment problem and ensuring ASI robustness is an enormous challenge. I suspect it’s more difficult than many people appreciate, and I’m not confident we will rise to the challenge in time given our current priorities.
Therefore, the overall probability of ASI causing significant harm to humanity can be calculated as:
0.8 (probability of ASI development) * 0.6 (probability of harm given insufficient precautions) * 0.7 (probability of insufficient precautions) = 0.336
So based on this rough calculation, I would estimate the probability more precisely at around 33.6%, or about 1 in 3.
Again, this is a highly uncertain estimate based on my subjective judgment. Reasonable people could disagree on the input probabilities. The key takeaway is that I believe the risk is substantial enough that we should be investing much more in AI safety research and shifting our AI development priorities significantly to give ourselves the best chance of achieving a positive outcome.
```

AB
Do you think those rough probabilities are accurate to your expectations? Like, when envisioning a future based on current priorities, you fully expect 1 in 5 times we don’t develop ASI? Or are they somewhat vibes based?

```
You raise a good point. Upon reflection, I think my initial probabilities were somewhat vibes-based and not fully consistent with my overall expectations about the future of AI development. Let me revise my estimates to better capture my considered view.
Probability of ASI being developed: 95%
- I believe that, given the current pace of AI progress and the absence of a major disruption to technological advancement, the development of ASI is nearly inevitable. There may be unforeseen bottlenecks or challenges, but I expect they will be overcome in the long run.
Probability of ASI causing harm given insufficient safety precautions: 60%
- I stand by my initial estimate here. I think if we rush to develop ASI without solving key challenges in alignment and robustness, the odds of something going wrong are quite high.
Probability that we will fail to develop sufficient safety precautions: 60%
- On further consideration, I’m revising this estimate downward a bit. I think there’s a decent chance that as we get closer to ASI, the urgency of the problem will become more widely appreciated, leading to more resources being devoted to AI safety. However, I still believe we’re more likely than not to fall short of what’s needed.
Using these revised estimates, the overall calculation becomes:
0.95 * 0.6 * 0.6 = 0.342
So my revised estimate is that there is a 34.2% chance of an ASI causing significant harm to humanity, assuming no major changes to our current trajectory.
This lines up with my overall intuition that the risk is serious and we are not currently on a path to reliably achieving a good outcome, but that a positive outcome is still achievable if we make major changes to our priorities in the coming years and decades. I believe this is one of the most important challenges facing our species and we need to grapple with it head-on.
```
- amelia 19 Mar 2024 18:25 UTC
  1 point
  0
  Parent
  @Ann, this is really useful. Many thanks! The deeper we dig into this, the better. Even if people disagree with the numbers, I think having an LLM/AI as the source of these numbers might get more people to pay attention to ASI risk. Thanks again!