Children learn to follow common sense, despite not having (explicit) meta-ethical and meta-normative beliefs at all.
Children also learn right from wrong—I’d be interested in where you draw the line between “An AI that learns common sense” and “An AI that learns right from wrong.” (You say this argument doesn’t apply in the case of human values, but it seems like you mean only explicit human values, not implicit ones.)
My suspicion, which is interesting to me so I’ll explain it even if you’re going to tell me that I’m off base, is that you’re thinking that part of common sense is to avoid uncertain or extreme situations (e.g. reshaping the galaxy with nanotechnology), and that common sense is generally safe and trustworthy for an AI to follow, in a way that doesn’t carry over to “knowing right from wrong.” An AI that has learned right from wrong to the same extent that humans learn it might make dangerous moral mistakes.
But when I think about it like that, it actually makes me less trusting of learned common sense. After all, one of the most universally acknowledged things about common sense is that it’s uncommon among humans! Merely doing common sense as well as humans seems like a recipe for making a horrible mistake because it seemed like the right thing at the time—this opens the door to the same old alignment problems (like self-reflection and meta-preferences [or should that be meta-common-sense]).
P.S. I’m not sure I quite agree with this particular setting of normativity. The reason is the possibility of “subjective objectivity”, where you can make what you mean by “Quality Y” arbitrarily precise and formal if given long enough to split hairs. Thus equipped, you can turn “Does this have quality Y?” into an objective question by checking against the (sufficiently) formal, precise definition.
The point is that the aliens are going to be able to evaluate this formal definition just as well as you. They just don’t care about it. Even if you both call something “Quality Y,” that doesn’t avail you much if you’re using that word to mean very different things. (Obligatory old Eliezer post)
Anyhow, I’d bet that xuan is not saying that it is impossible to build an AI with common sense—they’re saying that building an AI with common sense is in the same epistemological category as building an AI that knows right from wrong.
Children also learn right from wrong—I’d be interested in where you draw the line between “An AI that learns common sense” and “An AI that learns right from wrong.”
I’m happy to assume that AI will learn right from wrong to about the level that children do. This is not a sufficiently good definition of “the good” that we can then optimize it.
My suspicion, which is interesting to me so I’ll explain it even if you’re going to tell me that I’m off base, is that you’re thinking that part of common sense is to avoid uncertain or extreme situations (e.g. reshaping the galaxy with nanotechnology), and that common sense is generally safe and trustworthy for an AI to follow, in a way that doesn’t carry over to “knowing right from wrong.” An AI that has learned right from wrong to the same extent that humans learn it might make dangerous moral mistakes.
That sounds basically right, with the caveat that you want to be a bit more specific and precise with what the AI system should do than just saying “common sense”; I’m using the phrase as a placeholder for something more precise that we need to figure out.
Also, I’d change the last sentence to “an AI that has learned right from wrong to the same extent that humans learn it, and then optimizes for right things as hard as possible, will probably make dangerous moral mistakes”. The point is that when you’re trying to define “the good” and then optimize it, you need to be very very correct in your definition, whereas when you’re trying not to optimize too hard in the first place (which is part of what I mean by “common sense”) then that’s no longer the case.
After all, one of the most universally acknowledged things about common sense is that it’s uncommon among humans!
I think at this point I don’t think we’re talking about the same “common sense”.
Merely doing common sense as well as humans seems like a recipe for making a horrible mistake because it seemed like the right thing at the time—this opens the door to the same old alignment problems (like self-reflection and meta-preferences [or should that be meta-common-sense]).
But why?
they’re saying that building an AI with common sense is in the same epistemological category as building an AI that knows right from wrong.
Again it depends on how accurate the “right/wrong classifier” needs to be, and how accurate the “common sense” needs to be. My main claim is that the path to safety that goes via “common sense” is much more tolerant of inaccuracies than the path that goes through optimizing the output of the right/wrong classifier.
My first idea is, you take your common sense AI, and rather than saying “build me a spaceship, but, like, use common sense,” you can tell it “do the right thing, but, like, use common sense.” (Obviously with “saying” and “tell” in invisible finger quotes.) Bam, Type-1 FAI.
Of course, whether this will go wrong or not depends on the specifics. I’m reminded of Adam Shimi et al’s recent post that mentioned “Ideal Accomplishment” (how close to an explicit goal a system eventually gets) and “Efficiency” (how fast it gets there). If you have a general purpose “common sensical optimizer” that optimizes any goal but, like, does it in a common sense way, then before you turn it on you’d better know whether it’s affecting ideal accomplishment, or just efficiency.
That is to say, if I tell it to make me the best spaceship it can or something similarly stupid, will the AI “know that the goal is stupid” and only make a normal spaceship before stopping? Or will it eventually turn the galaxy into spaceship, just taking common-sense actions along the way? The truly idiot-proof common sensical optimizer changes its final destination so that it does what we “obviously” meant, not what we actually said. The flaws in this process seem to determine if it’s trustworthy enough to tell to “do the right thing,” or trustworthy enough to tell to do anything at all.
Children also learn right from wrong—I’d be interested in where you draw the line between “An AI that learns common sense” and “An AI that learns right from wrong.” (You say this argument doesn’t apply in the case of human values, but it seems like you mean only explicit human values, not implicit ones.)
My suspicion, which is interesting to me so I’ll explain it even if you’re going to tell me that I’m off base, is that you’re thinking that part of common sense is to avoid uncertain or extreme situations (e.g. reshaping the galaxy with nanotechnology), and that common sense is generally safe and trustworthy for an AI to follow, in a way that doesn’t carry over to “knowing right from wrong.” An AI that has learned right from wrong to the same extent that humans learn it might make dangerous moral mistakes.
But when I think about it like that, it actually makes me less trusting of learned common sense. After all, one of the most universally acknowledged things about common sense is that it’s uncommon among humans! Merely doing common sense as well as humans seems like a recipe for making a horrible mistake because it seemed like the right thing at the time—this opens the door to the same old alignment problems (like self-reflection and meta-preferences [or should that be meta-common-sense]).
P.S. I’m not sure I quite agree with this particular setting of normativity. The reason is the possibility of “subjective objectivity”, where you can make what you mean by “Quality Y” arbitrarily precise and formal if given long enough to split hairs. Thus equipped, you can turn “Does this have quality Y?” into an objective question by checking against the (sufficiently) formal, precise definition.
The point is that the aliens are going to be able to evaluate this formal definition just as well as you. They just don’t care about it. Even if you both call something “Quality Y,” that doesn’t avail you much if you’re using that word to mean very different things. (Obligatory old Eliezer post)
Anyhow, I’d bet that xuan is not saying that it is impossible to build an AI with common sense—they’re saying that building an AI with common sense is in the same epistemological category as building an AI that knows right from wrong.
I’m happy to assume that AI will learn right from wrong to about the level that children do. This is not a sufficiently good definition of “the good” that we can then optimize it.
That sounds basically right, with the caveat that you want to be a bit more specific and precise with what the AI system should do than just saying “common sense”; I’m using the phrase as a placeholder for something more precise that we need to figure out.
Also, I’d change the last sentence to “an AI that has learned right from wrong to the same extent that humans learn it, and then optimizes for right things as hard as possible, will probably make dangerous moral mistakes”. The point is that when you’re trying to define “the good” and then optimize it, you need to be very very correct in your definition, whereas when you’re trying not to optimize too hard in the first place (which is part of what I mean by “common sense”) then that’s no longer the case.
I think at this point I don’t think we’re talking about the same “common sense”.
But why?
Again it depends on how accurate the “right/wrong classifier” needs to be, and how accurate the “common sense” needs to be. My main claim is that the path to safety that goes via “common sense” is much more tolerant of inaccuracies than the path that goes through optimizing the output of the right/wrong classifier.
My first idea is, you take your common sense AI, and rather than saying “build me a spaceship, but, like, use common sense,” you can tell it “do the right thing, but, like, use common sense.” (Obviously with “saying” and “tell” in invisible finger quotes.) Bam, Type-1 FAI.
Of course, whether this will go wrong or not depends on the specifics. I’m reminded of Adam Shimi et al’s recent post that mentioned “Ideal Accomplishment” (how close to an explicit goal a system eventually gets) and “Efficiency” (how fast it gets there). If you have a general purpose “common sensical optimizer” that optimizes any goal but, like, does it in a common sense way, then before you turn it on you’d better know whether it’s affecting ideal accomplishment, or just efficiency.
That is to say, if I tell it to make me the best spaceship it can or something similarly stupid, will the AI “know that the goal is stupid” and only make a normal spaceship before stopping? Or will it eventually turn the galaxy into spaceship, just taking common-sense actions along the way? The truly idiot-proof common sensical optimizer changes its final destination so that it does what we “obviously” meant, not what we actually said. The flaws in this process seem to determine if it’s trustworthy enough to tell to “do the right thing,” or trustworthy enough to tell to do anything at all.