To make a more specific claim (to be evaluated separately), I mostly expect this due to speed advantage, combined with examples of how human concepts are alien relative to those of analogously speed-disadvantaged living systems. For instance, most plants and somatic (non-neuronal) animal components use a lot of (very slow) electrical signalling to make very complex decisions (e.g., morphogenesis and healing; see Michael Levin’s work on reprogramming regenerative organisms by decoding their electrical signalling). To the extent that these living systems (plants, and animal-parts) utilize “concepts” in the course of their complex decision-making, at present they seem quite alien to us, and many people (including some likely responders to this comment) will say that plants and somatic animal components entirely lack intelligence and do not make decisions. I’m not trying to argue for some kind of panpsychism or expanding circle of compassion here, just pointing out a large body of research (again, start with Levin) showing complex and robust decision-making within plants and (even more so) animal bodies, which humans consider relatively speaking “unintelligent” or at least “not thinking in what we regard to be valid abstract concepts”, and I think there will be a similar disparity between humans and A(G)I after it runs for a while (say, 1000 subjective civilization-years, or a few days to a year of human-clock-time).
I expect lots of alien concepts in domains where AI far surpasses humans (e.g. I expect this to be true of AlphaFold). But if you look at the text of the ruin argument:
Nobody knows what the hell GPT-3 is thinking, not only because the matrices are opaque, but because the stuff within that opaque container is, very likely, incredibly alien—nothing that would translate well into comprehensible human thinking, even if we could see past the giant wall of floating-point numbers to what lay behind.
I think this is pretty questionable. I expect that a good chunk of GPT-3′s cognition is something that could be translated into something comprehensible, mostly because I think humans are really good at language and GPT-3 is only somewhat better on some axes (and worse on others). I don’t remember what I said on this survey but right now I’m feeling like it’s “Unclear”, since I expect lots of AIs to have lots of alien concepts, but I don’t think I expect quite as much alienness as Eliezer seems to expect.
(And this does seem to materially change how difficult you expect alignment to be; on my view you can hope that in addition to all the alien concepts the AI also has regular concepts about “am I doing what my designers want” or “am I deceiving the humans” which you could then hope to extract with interpretability.)
To make a more specific claim (to be evaluated separately), I mostly expect this due to speed advantage, combined with examples of how human concepts are alien relative to those of analogously speed-disadvantaged living systems. For instance, most plants and somatic (non-neuronal) animal components use a lot of (very slow) electrical signalling to make very complex decisions (e.g., morphogenesis and healing; see Michael Levin’s work on reprogramming regenerative organisms by decoding their electrical signalling). To the extent that these living systems (plants, and animal-parts) utilize “concepts” in the course of their complex decision-making, at present they seem quite alien to us, and many people (including some likely responders to this comment) will say that plants and somatic animal components entirely lack intelligence and do not make decisions. I’m not trying to argue for some kind of panpsychism or expanding circle of compassion here, just pointing out a large body of research (again, start with Levin) showing complex and robust decision-making within plants and (even more so) animal bodies, which humans consider relatively speaking “unintelligent” or at least “not thinking in what we regard to be valid abstract concepts”, and I think there will be a similar disparity between humans and A(G)I after it runs for a while (say, 1000 subjective civilization-years, or a few days to a year of human-clock-time).
I expect lots of alien concepts in domains where AI far surpasses humans (e.g. I expect this to be true of AlphaFold). But if you look at the text of the ruin argument:
I think this is pretty questionable. I expect that a good chunk of GPT-3′s cognition is something that could be translated into something comprehensible, mostly because I think humans are really good at language and GPT-3 is only somewhat better on some axes (and worse on others). I don’t remember what I said on this survey but right now I’m feeling like it’s “Unclear”, since I expect lots of AIs to have lots of alien concepts, but I don’t think I expect quite as much alienness as Eliezer seems to expect.
(And this does seem to materially change how difficult you expect alignment to be; on my view you can hope that in addition to all the alien concepts the AI also has regular concepts about “am I doing what my designers want” or “am I deceiving the humans” which you could then hope to extract with interpretability.)