While I agree with much of this content, I think you guys (the anonymous authors) are most likely to be wrong in your disagreement with the “alien concepts” point (#33).
To make a more specific claim (to be evaluated separately), I mostly expect this due to speed advantage, combined with examples of how human concepts are alien relative to those of analogously speed-disadvantaged living systems. For instance, most plants and somatic (non-neuronal) animal components use a lot of (very slow) electrical signalling to make very complex decisions (e.g., morphogenesis and healing; see Michael Levin’s work on reprogramming regenerative organisms by decoding their electrical signalling). To the extent that these living systems (plants, and animal-parts) utilize “concepts” in the course of their complex decision-making, at present they seem quite alien to us, and many people (including some likely responders to this comment) will say that plants and somatic animal components entirely lack intelligence and do not make decisions. I’m not trying to argue for some kind of panpsychism or expanding circle of compassion here, just pointing out a large body of research (again, start with Levin) showing complex and robust decision-making within plants and (even more so) animal bodies, which humans consider relatively speaking “unintelligent” or at least “not thinking in what we regard to be valid abstract concepts”, and I think there will be a similar disparity between humans and A(G)I after it runs for a while (say, 1000 subjective civilization-years, or a few days to a year of human-clock-time).
I expect lots of alien concepts in domains where AI far surpasses humans (e.g. I expect this to be true of AlphaFold). But if you look at the text of the ruin argument:
Nobody knows what the hell GPT-3 is thinking, not only because the matrices are opaque, but because the stuff within that opaque container is, very likely, incredibly alien—nothing that would translate well into comprehensible human thinking, even if we could see past the giant wall of floating-point numbers to what lay behind.
I think this is pretty questionable. I expect that a good chunk of GPT-3′s cognition is something that could be translated into something comprehensible, mostly because I think humans are really good at language and GPT-3 is only somewhat better on some axes (and worse on others). I don’t remember what I said on this survey but right now I’m feeling like it’s “Unclear”, since I expect lots of AIs to have lots of alien concepts, but I don’t think I expect quite as much alienness as Eliezer seems to expect.
(And this does seem to materially change how difficult you expect alignment to be; on my view you can hope that in addition to all the alien concepts the AI also has regular concepts about “am I doing what my designers want” or “am I deceiving the humans” which you could then hope to extract with interpretability.)
Also, I wonder to what extent our own “thinking” is based on concepts we ourselves understand. I’d bet I don’t really understand what concepts most of my own thinking processes use.
Like: what are the exact concepts I use when I throw a ball? Is there a term for velocity, gravity constant or air friction, or is it just some completely “alien” computation which is “inlined” and “tree-shaked” of any unneeded abstractions, which just sends motor outputs given the target position?
Or: what concepts do I use to know what word to place at this place in this sentence? Do I use concepts like “subject”, or “verb” or “sentiment”, or rather just go with the flow subconsciously, having just a vague idea of the direction I am going with this argument?
Or: what concepts do I really use when deciding to rotate the steering wheel 2 degrees to the right when driving a car through a forest road’s gentle turn? Do I think about “angles”, “asphalt”, “trees”, “centrifugal force”, “tire friction”, or rather just try to push the future into the direction where the road ahead looks more straight to me and somehow I just know that this steering wheel is “straightening” the image I see?
Or: how exactly do I solve (not: verify an already written proof) a math problem? How does the solution pop into my mind? Is there some systematic search over all possible terms and derivations, or rather some giant hash-map-like interconnected “related tricks and transformations I seen before” which get proposed?
I think my point is that we should not conflate the way we actually solve problems (subconsciously?), with the way we talk (consciously) about solutions we’ve already found when trying to verify them ourselves (the inner monologue) or convey them to another person. First of all, the Release binary and Debug binaries can differ (it’s completely different experience to ride a bike for a first time, than an on 2000th attempt). Second, the on-the-wire format and the data structure before serialization can be very different (the way I explain how to solve an equation to my kid is not exactly how I solve it).
I think, that training a separate AI to interpret for us the inner workings of another AI is risky, the same way a Public Relations department or a lawyer doesn’t necessarily give you the honest picture of what the client is really up to.
Also, I there’s much talk about distinction between system 1 and 2, or subconsciousness and consciousness, etc.
But, do we really treat seriously the implication of all that: the concepts our conscious part of mind uses to “explain” the subconscious actions have almost nothing to do with how it actually happened. If we force the AI to use these concepts it will either lie to us (“Your honor, as we shall soon see the defendant wanted to..”) , or be crippled (have you tried to drive a car using just the concepts from physics text book?). But even in the later case it looks like a lie to me, because even if the AI is really using the concepts it claims/seems/reported to be using, there’s still the mismatch in myself: I think I now understand that the AI works just like me, while in the reality I work completely differently than I thought. How bad that is depends on problem domain, IMHO. This might be pretty good if the AI is trying to solve a problem like “how to throw a ball” and a program using physic equations is actually also a good way of doing it. But once we get to more complicated stuff like operating a autonomous drone on the battlefield or governing country’s budget I think there’s a risk because we don’t really know how we ourselves make these kind of decisions.
Yes, this surprised me to. Perhaps it was the phrasing that they disagreed with? If you asked them about all possible intelligences in mindspace, and asked them if they thought AGI would fall very close to most human minds, maybe their answer would be different.
While I agree with much of this content, I think you guys (the anonymous authors) are most likely to be wrong in your disagreement with the “alien concepts” point (#33).
To make a more specific claim (to be evaluated separately), I mostly expect this due to speed advantage, combined with examples of how human concepts are alien relative to those of analogously speed-disadvantaged living systems. For instance, most plants and somatic (non-neuronal) animal components use a lot of (very slow) electrical signalling to make very complex decisions (e.g., morphogenesis and healing; see Michael Levin’s work on reprogramming regenerative organisms by decoding their electrical signalling). To the extent that these living systems (plants, and animal-parts) utilize “concepts” in the course of their complex decision-making, at present they seem quite alien to us, and many people (including some likely responders to this comment) will say that plants and somatic animal components entirely lack intelligence and do not make decisions. I’m not trying to argue for some kind of panpsychism or expanding circle of compassion here, just pointing out a large body of research (again, start with Levin) showing complex and robust decision-making within plants and (even more so) animal bodies, which humans consider relatively speaking “unintelligent” or at least “not thinking in what we regard to be valid abstract concepts”, and I think there will be a similar disparity between humans and A(G)I after it runs for a while (say, 1000 subjective civilization-years, or a few days to a year of human-clock-time).
I expect lots of alien concepts in domains where AI far surpasses humans (e.g. I expect this to be true of AlphaFold). But if you look at the text of the ruin argument:
I think this is pretty questionable. I expect that a good chunk of GPT-3′s cognition is something that could be translated into something comprehensible, mostly because I think humans are really good at language and GPT-3 is only somewhat better on some axes (and worse on others). I don’t remember what I said on this survey but right now I’m feeling like it’s “Unclear”, since I expect lots of AIs to have lots of alien concepts, but I don’t think I expect quite as much alienness as Eliezer seems to expect.
(And this does seem to materially change how difficult you expect alignment to be; on my view you can hope that in addition to all the alien concepts the AI also has regular concepts about “am I doing what my designers want” or “am I deceiving the humans” which you could then hope to extract with interpretability.)
Also, I wonder to what extent our own “thinking” is based on concepts we ourselves understand. I’d bet I don’t really understand what concepts most of my own thinking processes use.
Like: what are the exact concepts I use when I throw a ball? Is there a term for velocity, gravity constant or air friction, or is it just some completely “alien” computation which is “inlined” and “tree-shaked” of any unneeded abstractions, which just sends motor outputs given the target position?
Or: what concepts do I use to know what word to place at this place in this sentence? Do I use concepts like “subject”, or “verb” or “sentiment”, or rather just go with the flow subconsciously, having just a vague idea of the direction I am going with this argument?
Or: what concepts do I really use when deciding to rotate the steering wheel 2 degrees to the right when driving a car through a forest road’s gentle turn? Do I think about “angles”, “asphalt”, “trees”, “centrifugal force”, “tire friction”, or rather just try to push the future into the direction where the road ahead looks more straight to me and somehow I just know that this steering wheel is “straightening” the image I see?
Or: how exactly do I solve (not: verify an already written proof) a math problem? How does the solution pop into my mind? Is there some systematic search over all possible terms and derivations, or rather some giant hash-map-like interconnected “related tricks and transformations I seen before” which get proposed?
I think my point is that we should not conflate the way we actually solve problems (subconsciously?), with the way we talk (consciously) about solutions we’ve already found when trying to verify them ourselves (the inner monologue) or convey them to another person. First of all, the Release binary and Debug binaries can differ (it’s completely different experience to ride a bike for a first time, than an on 2000th attempt). Second, the on-the-wire format and the data structure before serialization can be very different (the way I explain how to solve an equation to my kid is not exactly how I solve it).
I think, that training a separate AI to interpret for us the inner workings of another AI is risky, the same way a Public Relations department or a lawyer doesn’t necessarily give you the honest picture of what the client is really up to.
Also, I there’s much talk about distinction between system 1 and 2, or subconsciousness and consciousness, etc.
But, do we really treat seriously the implication of all that: the concepts our conscious part of mind uses to “explain” the subconscious actions have almost nothing to do with how it actually happened. If we force the AI to use these concepts it will either lie to us (“Your honor, as we shall soon see the defendant wanted to..”) , or be crippled (have you tried to drive a car using just the concepts from physics text book?). But even in the later case it looks like a lie to me, because even if the AI is really using the concepts it claims/seems/reported to be using, there’s still the mismatch in myself: I think I now understand that the AI works just like me, while in the reality I work completely differently than I thought. How bad that is depends on problem domain, IMHO. This might be pretty good if the AI is trying to solve a problem like “how to throw a ball” and a program using physic equations is actually also a good way of doing it. But once we get to more complicated stuff like operating a autonomous drone on the battlefield or governing country’s budget I think there’s a risk because we don’t really know how we ourselves make these kind of decisions.
Yes, this surprised me to. Perhaps it was the phrasing that they disagreed with? If you asked them about all possible intelligences in mindspace, and asked them if they thought AGI would fall very close to most human minds, maybe their answer would be different.