Thanks for your comment.
The misuse risks seem much more important, both as real risks, and in their saliency to ordinary people.
I agree that it may be easier to persuade the general public about misuse risks and that these risks are likely to occur if we achieve intent alignment, but in terms of assessing the relative probability: “If we solve alignment” is a significant “if.” I take it you view solving intent alignment as not all that unlikely? If so, why? Specifically, how do you expect we will figure out how to prevent deceptive alignment and goal misgeneralization by the time we reach AGI?
Also, in the article you linked, you base your scenario on the assumption of a slow takeoff. Why do you expect this will be the case?
I don’t think we should adopt an ignorance prior over goals. Humans are going to try to assign goals to AGI. Those goals will very likely involve humans somehow.
Of course humans will try to assign human-related goals to AGI, but how likely is it that, if the AI is misaligned, the attempt to instill human-related goals will actually lead to consequences that involve conscious humans and not molecular smiley faces?
Makes sense. What probability do you place on this? It would require solving alignment, a second AI being created before the first can create a singleton, and then the misaligned AI choosing this kind of blackmail over other possible tactics. If the blackmail involves sentient simulations (as is sometimes suggested, although not in your comment), it would seem that the misaligned AI would have to solve the hard problem of consciousness and be able to prove this to the other AI (not a valid blackmail if the simulations are not known to be sentient).