Something very bad might happen, but something very good might happen too, and I am not sure how to compare the probabilities.
A misaligned AI will probably kill us all. The AI that would torture us seems like something that is almost-aligned but also sufficiently non-aligned. (Or maybe perfectly aligned with some psychopath who enjoys torture.) What is the probability of getting the alignment almost right, but sufficiently wrong? No idea.
It may be tempting to translate “no idea” into “50% chance of heaven, 50% chance of hell” (and perhaps conclude that it’s not worth it), but that’s probably not how this works.
Something very bad might happen, but something very good might happen too, and I am not sure how to compare the probabilities.
A misaligned AI will probably kill us all. The AI that would torture us seems like something that is almost-aligned but also sufficiently non-aligned. (Or maybe perfectly aligned with some psychopath who enjoys torture.) What is the probability of getting the alignment almost right, but sufficiently wrong? No idea.
It may be tempting to translate “no idea” into “50% chance of heaven, 50% chance of hell” (and perhaps conclude that it’s not worth it), but that’s probably not how this works.