The misuse risks seem much more important, both as real risks, and in their saliency to ordinary people.
I agree that it may be easier to persuade the general public about misuse risks and that these risks are likely to occur if we achieve intent alignment, but in terms of assessing the relative probability: “If we solve alignment” is a significant “if.” I take it you view solving intent alignment as not all that unlikely? If so, why? Specifically, how do you expect we will figure out how to prevent deceptive alignment and goal misgeneralization by the time we reach AGI?
Also, in the article you linked, you base your scenario on the assumption of a slow takeoff. Why do you expect this will be the case?
I don’t think we should adopt an ignorance prior over goals. Humans are going to try to assign goals to AGI. Those goals will very likely involve humans somehow.
Of course humans will try to assign human-related goals to AGI, but how likely is it that, if the AI is misaligned, the attempt to instill human-related goals will actually lead to consequences that involve conscious humans and not molecular smiley faces?
Thanks for your comment.
I agree that it may be easier to persuade the general public about misuse risks and that these risks are likely to occur if we achieve intent alignment, but in terms of assessing the relative probability: “If we solve alignment” is a significant “if.” I take it you view solving intent alignment as not all that unlikely? If so, why? Specifically, how do you expect we will figure out how to prevent deceptive alignment and goal misgeneralization by the time we reach AGI?
Also, in the article you linked, you base your scenario on the assumption of a slow takeoff. Why do you expect this will be the case?
Of course humans will try to assign human-related goals to AGI, but how likely is it that, if the AI is misaligned, the attempt to instill human-related goals will actually lead to consequences that involve conscious humans and not molecular smiley faces?