I think it might be helpful to have a variant of 3a that likewise says the orthogonality thesis is false, but is not quite so optimistic as to say the alternative is that AI will be “benevolent by default”. One way the orthogonality thesis could be false would be that an AI capable of human-like behavior (and which could be built using near-future computing power, say less than or equal to the computing power needed for mind uploading) would have to be significantly more similar to biological brains than current AI approaches, and in particular would have to go through an extended period of embodied social learning similar to children, with this learning process depending on certain kinds of sociable drives along with other similar features like curiosity, playfulness, a bias towards sensory data a human might consider “complex” and “interesting”, etc. This degree of convergence with biological structure and drives might make it unlikely it would end up optimizing for arbitrary goals we would see as boring and monomaniacal like paperclip-maximizing, but wouldn’t necessarily guarantee friendliness towards humans either. It’d be more akin to reaching into a parallel universe where a language-using intelligent biological species had evolved from different ancestors, grabbing a bunch of their babies and raising them in human society—they might be similar enough to learn language and engage in the same kind of complex-problem solving as humans, but even if they didn’t pursue what we would see as boring/monomaniacal goals, their drives and values might be different enough to cause conflict.
Eliezer Yudkowsky’s 2013 post at https://www.facebook.com/yudkowsky/posts/10152068084299228 imagined a “cosmopolitan cosmist transhumanist” who would be OK with a future dominated by beings significantly different from us, but who still wants future minds to “fall somewhere within a large space of possibilities that requires detailed causal inheritance from modern humans” as opposed to minds completely outside of this space like paperclip maximizers (in his tweet this May at https://twitter.com/ESYudkowsky/status/1662113079394484226 he made a similar point). So one could have a scenario where orthogonality is false in the sense that paperclip maximizer type AIs aren’t overwhelmingly likely even if we fail to develop good alignment techniques, but where even if the degree of convergence with biological brains is sufficient that we’re likely to get a mind that a cosmopolitan cosmist transhumanist would be OK with (they would still pursue science, art etc.), we can’t be confident we’ll get something completely benevolent by default towards human beings. I’m a sort of Star Trek style optimist about different intelligent beings with broadly similar goals being able to live in harmony, especially in some kind of post-scarcity future of widespread abundance, but it’s just a hunch—even if orthogonality is false in the way I suggested, I don’t think there’s any knock-down argument that creating a new form of intelligence would be free of risk to humanity.
I think it might be helpful to have a variant of 3a that likewise says the orthogonality thesis is false, but is not quite so optimistic as to say the alternative is that AI will be “benevolent by default”. One way the orthogonality thesis could be false would be that an AI capable of human-like behavior (and which could be built using near-future computing power, say less than or equal to the computing power needed for mind uploading) would have to be significantly more similar to biological brains than current AI approaches, and in particular would have to go through an extended period of embodied social learning similar to children, with this learning process depending on certain kinds of sociable drives along with other similar features like curiosity, playfulness, a bias towards sensory data a human might consider “complex” and “interesting”, etc. This degree of convergence with biological structure and drives might make it unlikely it would end up optimizing for arbitrary goals we would see as boring and monomaniacal like paperclip-maximizing, but wouldn’t necessarily guarantee friendliness towards humans either. It’d be more akin to reaching into a parallel universe where a language-using intelligent biological species had evolved from different ancestors, grabbing a bunch of their babies and raising them in human society—they might be similar enough to learn language and engage in the same kind of complex-problem solving as humans, but even if they didn’t pursue what we would see as boring/monomaniacal goals, their drives and values might be different enough to cause conflict.
Eliezer Yudkowsky’s 2013 post at https://www.facebook.com/yudkowsky/posts/10152068084299228 imagined a “cosmopolitan cosmist transhumanist” who would be OK with a future dominated by beings significantly different from us, but who still wants future minds to “fall somewhere within a large space of possibilities that requires detailed causal inheritance from modern humans” as opposed to minds completely outside of this space like paperclip maximizers (in his tweet this May at https://twitter.com/ESYudkowsky/status/1662113079394484226 he made a similar point). So one could have a scenario where orthogonality is false in the sense that paperclip maximizer type AIs aren’t overwhelmingly likely even if we fail to develop good alignment techniques, but where even if the degree of convergence with biological brains is sufficient that we’re likely to get a mind that a cosmopolitan cosmist transhumanist would be OK with (they would still pursue science, art etc.), we can’t be confident we’ll get something completely benevolent by default towards human beings. I’m a sort of Star Trek style optimist about different intelligent beings with broadly similar goals being able to live in harmony, especially in some kind of post-scarcity future of widespread abundance, but it’s just a hunch—even if orthogonality is false in the way I suggested, I don’t think there’s any knock-down argument that creating a new form of intelligence would be free of risk to humanity.