I agree with you on 1 and 2 (and am perhaps more optimistic about not building globally optimizing agents; I actually see that as the “default” outcome).
My worry is that the difficulty of building machines that “follow common sense” is on the same order of magnitude as “defining the good”, and just as beset by the meta-ethical and meta-normative worries I’ve raised above.
I think this is where I disagree. I’d offer two main reasons not to believe this:
Children learn to follow common sense, despite not having (explicit) meta-ethical and meta-normative beliefs at all. (Though you could argue that the relevant meta-ethical and meta-normative concepts are inherent in / embedded in / compiled into the human brain’s “priors” and learning algorithm.)
Intuitively, it seems like sufficiently good imitations of humans would have to have (perhaps implicit) knowledge of “common sense”. We can see this to some extent, where GPT-3 demonstrates implicit knowledge of at least some aspects of common sense (though I do not claim that it acts in accordance with common sense).
(As a sanity check, we can see that neither of these arguments would apply to the “learning human values” case.)
I’m going to assume that Quality Y is “normative” if determining whether an object X has quality Y depends on who is evaluating Y. Put another way, an independent race of aliens that had never encountered humans would probably not converge to the same judgments as we do about quality Y.
This feels similar to the is-ought distinction: you cannot determine “ought” facts from “is” facts, because “ought” facts are normative, whereas “is” facts are not (though perhaps you disagree with the latter).
I think “common sense is normative” is sufficient to argue that a race of aliens could not build an AI system that had our common sense, without either the aliens or the AI system figuring out the right meta-normative concepts for humanity (which they presumably could not do without encountering humans first).
I don’t see why it implies that we cannot build an AI system that has our common sense. Even if our common sense is normative, its effects are widespread; it should be possible in theory to back out the concept from its effects, and I don’t see a reason it would be impossible in practice (and in fact human children feel like a great example that it is possible in practice).
I suspect that on a symbolic account of knowledge, it becomes more important to have the right meta-normative principles (though I still wonder what one would say to the example of children). I also think cog sci would be an obvious line of attack on a symbolic account of knowledge; it feels less clear how relevant it is on a connectionist account. (Though I haven’t read the research in this space; it’s possible I’m just missing something basic.)
Children learn to follow common sense, despite not having (explicit) meta-ethical and meta-normative beliefs at all.
Children also learn right from wrong—I’d be interested in where you draw the line between “An AI that learns common sense” and “An AI that learns right from wrong.” (You say this argument doesn’t apply in the case of human values, but it seems like you mean only explicit human values, not implicit ones.)
My suspicion, which is interesting to me so I’ll explain it even if you’re going to tell me that I’m off base, is that you’re thinking that part of common sense is to avoid uncertain or extreme situations (e.g. reshaping the galaxy with nanotechnology), and that common sense is generally safe and trustworthy for an AI to follow, in a way that doesn’t carry over to “knowing right from wrong.” An AI that has learned right from wrong to the same extent that humans learn it might make dangerous moral mistakes.
But when I think about it like that, it actually makes me less trusting of learned common sense. After all, one of the most universally acknowledged things about common sense is that it’s uncommon among humans! Merely doing common sense as well as humans seems like a recipe for making a horrible mistake because it seemed like the right thing at the time—this opens the door to the same old alignment problems (like self-reflection and meta-preferences [or should that be meta-common-sense]).
P.S. I’m not sure I quite agree with this particular setting of normativity. The reason is the possibility of “subjective objectivity”, where you can make what you mean by “Quality Y” arbitrarily precise and formal if given long enough to split hairs. Thus equipped, you can turn “Does this have quality Y?” into an objective question by checking against the (sufficiently) formal, precise definition.
The point is that the aliens are going to be able to evaluate this formal definition just as well as you. They just don’t care about it. Even if you both call something “Quality Y,” that doesn’t avail you much if you’re using that word to mean very different things. (Obligatory old Eliezer post)
Anyhow, I’d bet that xuan is not saying that it is impossible to build an AI with common sense—they’re saying that building an AI with common sense is in the same epistemological category as building an AI that knows right from wrong.
Children also learn right from wrong—I’d be interested in where you draw the line between “An AI that learns common sense” and “An AI that learns right from wrong.”
I’m happy to assume that AI will learn right from wrong to about the level that children do. This is not a sufficiently good definition of “the good” that we can then optimize it.
My suspicion, which is interesting to me so I’ll explain it even if you’re going to tell me that I’m off base, is that you’re thinking that part of common sense is to avoid uncertain or extreme situations (e.g. reshaping the galaxy with nanotechnology), and that common sense is generally safe and trustworthy for an AI to follow, in a way that doesn’t carry over to “knowing right from wrong.” An AI that has learned right from wrong to the same extent that humans learn it might make dangerous moral mistakes.
That sounds basically right, with the caveat that you want to be a bit more specific and precise with what the AI system should do than just saying “common sense”; I’m using the phrase as a placeholder for something more precise that we need to figure out.
Also, I’d change the last sentence to “an AI that has learned right from wrong to the same extent that humans learn it, and then optimizes for right things as hard as possible, will probably make dangerous moral mistakes”. The point is that when you’re trying to define “the good” and then optimize it, you need to be very very correct in your definition, whereas when you’re trying not to optimize too hard in the first place (which is part of what I mean by “common sense”) then that’s no longer the case.
After all, one of the most universally acknowledged things about common sense is that it’s uncommon among humans!
I think at this point I don’t think we’re talking about the same “common sense”.
Merely doing common sense as well as humans seems like a recipe for making a horrible mistake because it seemed like the right thing at the time—this opens the door to the same old alignment problems (like self-reflection and meta-preferences [or should that be meta-common-sense]).
But why?
they’re saying that building an AI with common sense is in the same epistemological category as building an AI that knows right from wrong.
Again it depends on how accurate the “right/wrong classifier” needs to be, and how accurate the “common sense” needs to be. My main claim is that the path to safety that goes via “common sense” is much more tolerant of inaccuracies than the path that goes through optimizing the output of the right/wrong classifier.
My first idea is, you take your common sense AI, and rather than saying “build me a spaceship, but, like, use common sense,” you can tell it “do the right thing, but, like, use common sense.” (Obviously with “saying” and “tell” in invisible finger quotes.) Bam, Type-1 FAI.
Of course, whether this will go wrong or not depends on the specifics. I’m reminded of Adam Shimi et al’s recent post that mentioned “Ideal Accomplishment” (how close to an explicit goal a system eventually gets) and “Efficiency” (how fast it gets there). If you have a general purpose “common sensical optimizer” that optimizes any goal but, like, does it in a common sense way, then before you turn it on you’d better know whether it’s affecting ideal accomplishment, or just efficiency.
That is to say, if I tell it to make me the best spaceship it can or something similarly stupid, will the AI “know that the goal is stupid” and only make a normal spaceship before stopping? Or will it eventually turn the galaxy into spaceship, just taking common-sense actions along the way? The truly idiot-proof common sensical optimizer changes its final destination so that it does what we “obviously” meant, not what we actually said. The flaws in this process seem to determine if it’s trustworthy enough to tell to “do the right thing,” or trustworthy enough to tell to do anything at all.
I agree with you on 1 and 2 (and am perhaps more optimistic about not building globally optimizing agents; I actually see that as the “default” outcome).
I think this is where I disagree. I’d offer two main reasons not to believe this:
Children learn to follow common sense, despite not having (explicit) meta-ethical and meta-normative beliefs at all. (Though you could argue that the relevant meta-ethical and meta-normative concepts are inherent in / embedded in / compiled into the human brain’s “priors” and learning algorithm.)
Intuitively, it seems like sufficiently good imitations of humans would have to have (perhaps implicit) knowledge of “common sense”. We can see this to some extent, where GPT-3 demonstrates implicit knowledge of at least some aspects of common sense (though I do not claim that it acts in accordance with common sense).
(As a sanity check, we can see that neither of these arguments would apply to the “learning human values” case.)
I’m going to assume that Quality Y is “normative” if determining whether an object X has quality Y depends on who is evaluating Y. Put another way, an independent race of aliens that had never encountered humans would probably not converge to the same judgments as we do about quality Y.
This feels similar to the is-ought distinction: you cannot determine “ought” facts from “is” facts, because “ought” facts are normative, whereas “is” facts are not (though perhaps you disagree with the latter).
I think “common sense is normative” is sufficient to argue that a race of aliens could not build an AI system that had our common sense, without either the aliens or the AI system figuring out the right meta-normative concepts for humanity (which they presumably could not do without encountering humans first).
I don’t see why it implies that we cannot build an AI system that has our common sense. Even if our common sense is normative, its effects are widespread; it should be possible in theory to back out the concept from its effects, and I don’t see a reason it would be impossible in practice (and in fact human children feel like a great example that it is possible in practice).
I suspect that on a symbolic account of knowledge, it becomes more important to have the right meta-normative principles (though I still wonder what one would say to the example of children). I also think cog sci would be an obvious line of attack on a symbolic account of knowledge; it feels less clear how relevant it is on a connectionist account. (Though I haven’t read the research in this space; it’s possible I’m just missing something basic.)
Children also learn right from wrong—I’d be interested in where you draw the line between “An AI that learns common sense” and “An AI that learns right from wrong.” (You say this argument doesn’t apply in the case of human values, but it seems like you mean only explicit human values, not implicit ones.)
My suspicion, which is interesting to me so I’ll explain it even if you’re going to tell me that I’m off base, is that you’re thinking that part of common sense is to avoid uncertain or extreme situations (e.g. reshaping the galaxy with nanotechnology), and that common sense is generally safe and trustworthy for an AI to follow, in a way that doesn’t carry over to “knowing right from wrong.” An AI that has learned right from wrong to the same extent that humans learn it might make dangerous moral mistakes.
But when I think about it like that, it actually makes me less trusting of learned common sense. After all, one of the most universally acknowledged things about common sense is that it’s uncommon among humans! Merely doing common sense as well as humans seems like a recipe for making a horrible mistake because it seemed like the right thing at the time—this opens the door to the same old alignment problems (like self-reflection and meta-preferences [or should that be meta-common-sense]).
P.S. I’m not sure I quite agree with this particular setting of normativity. The reason is the possibility of “subjective objectivity”, where you can make what you mean by “Quality Y” arbitrarily precise and formal if given long enough to split hairs. Thus equipped, you can turn “Does this have quality Y?” into an objective question by checking against the (sufficiently) formal, precise definition.
The point is that the aliens are going to be able to evaluate this formal definition just as well as you. They just don’t care about it. Even if you both call something “Quality Y,” that doesn’t avail you much if you’re using that word to mean very different things. (Obligatory old Eliezer post)
Anyhow, I’d bet that xuan is not saying that it is impossible to build an AI with common sense—they’re saying that building an AI with common sense is in the same epistemological category as building an AI that knows right from wrong.
I’m happy to assume that AI will learn right from wrong to about the level that children do. This is not a sufficiently good definition of “the good” that we can then optimize it.
That sounds basically right, with the caveat that you want to be a bit more specific and precise with what the AI system should do than just saying “common sense”; I’m using the phrase as a placeholder for something more precise that we need to figure out.
Also, I’d change the last sentence to “an AI that has learned right from wrong to the same extent that humans learn it, and then optimizes for right things as hard as possible, will probably make dangerous moral mistakes”. The point is that when you’re trying to define “the good” and then optimize it, you need to be very very correct in your definition, whereas when you’re trying not to optimize too hard in the first place (which is part of what I mean by “common sense”) then that’s no longer the case.
I think at this point I don’t think we’re talking about the same “common sense”.
But why?
Again it depends on how accurate the “right/wrong classifier” needs to be, and how accurate the “common sense” needs to be. My main claim is that the path to safety that goes via “common sense” is much more tolerant of inaccuracies than the path that goes through optimizing the output of the right/wrong classifier.
My first idea is, you take your common sense AI, and rather than saying “build me a spaceship, but, like, use common sense,” you can tell it “do the right thing, but, like, use common sense.” (Obviously with “saying” and “tell” in invisible finger quotes.) Bam, Type-1 FAI.
Of course, whether this will go wrong or not depends on the specifics. I’m reminded of Adam Shimi et al’s recent post that mentioned “Ideal Accomplishment” (how close to an explicit goal a system eventually gets) and “Efficiency” (how fast it gets there). If you have a general purpose “common sensical optimizer” that optimizes any goal but, like, does it in a common sense way, then before you turn it on you’d better know whether it’s affecting ideal accomplishment, or just efficiency.
That is to say, if I tell it to make me the best spaceship it can or something similarly stupid, will the AI “know that the goal is stupid” and only make a normal spaceship before stopping? Or will it eventually turn the galaxy into spaceship, just taking common-sense actions along the way? The truly idiot-proof common sensical optimizer changes its final destination so that it does what we “obviously” meant, not what we actually said. The flaws in this process seem to determine if it’s trustworthy enough to tell to “do the right thing,” or trustworthy enough to tell to do anything at all.