Not really a fair characterization I think: 2 mostly seems orthogonal to me (though I probably disagree with your claim. i.e. most important things are passed from previous generations. e.g. children learn that theft is bad, racism is bad etc, all of those things are passed from either parents or other adults. I don’t care a lot about the distinction parents vs other adults/society in this case. I know about the research that parenting has little influence, I don’t want to go into it preferably). 1 seems more relevant. In fact maybe the main reason for me to think this post is irrelevant is that the inductive biases in AI systems will be too different from that of humans (although note, genes still alow for a lot of variability in ethics and so on). But I still think it might be a good idea to keep in mind that “information in the brain about values has a higher risk to not get communicated into the training signal if the method of elliciting that information is not adapted to the way humans normally express the information”, if indeed it is true.
e.g. children learn that theft is bad, racism is bad etc, all of those things are passed from either parents or other adults
If a kid’s parents and teachers and other authority figures tell them that stealing is bad, while everyone in the kid’s peer group (and the next few grades up) steal all the time, and they never get in trouble, and they talk endlessly about how awesome it is, I think there’s a very good chance that the kid will wind up feeling that stealing is great, just make sure the adults don’t find out.
I speak from personal experience! As a kid, I used the original Napster to illegally download music. My parents categorized illegal music downloads as a type of theft, and therefore terribly unethical. So I did it without telling them. :-P
As a more mundane example, I recall that my parents and everyone in their generation thought that clothes should fit on your body, while my friends in middle school thought that clothes should be much much too large for your body. You can guess what size clothing I desperately wanted to wear.
(I think there’s some variation from kid to kid. Certainly some kids at some ages look up to their parents and feel motivated to be like them.)
inductive biases in AI systems will be too different from that of humans
In my mind, “different inductive bias” is less important here than “different reward functions”. (Details.) For example, high-functioning psychopaths are perfectly capable of understanding and imitating the cultural norms that they grew up in. They just don’t want to.
although note, genes still allow for a lot of variability in ethics and so on
I agree that cultures exist and are not identical.
I tend to think that learning and following the norms of a particular culture (further discussion) isn’t too hard a problem for an AGI which is motivated to do so, and hence I think of that motivation as the hard part. By contrast, I think I’m much more open-minded than you to the idea that there might be lots of ways to do the actual cultural learning. For example, the “natural” way for humans to learn Bedouin culture is to grow up as a Bedouin. But I think it’s fair to say that humans can also learn Bedouin culture quite well by growing up in a different culture and then moving into a Bedouin culture as an adult. And I think humans can even (to a lesser-but-still-significant extent) learn Bedouin culture by reading about it and watching YouTube videos etc.
“I tend to think that learning and following the norms of a particular culture (further discussion) isn’t too hard a problem for an AGI which is motivated to do so”. If the AGI is motivated to do so then the value learning problem is already solved and nothing else matters (in particular my post becomes irrelevant), because indeed it can learn the further details in whichever way it wants. We somehow already managed to create an agent with an internal objective that points to Bedouin culture (human values), which is the whole/complete problem.
I could say more about the rest of your comment but just checking if the above changes your model of my model significantly?
Also regarding “I think I’m much more open-minded than you to …”: to be clear, I’m not at all convinced about this I’m open to this distinction not mattering at all. I hope I didn’t come accross as not open minded about this.
An AGI with the motivation “I want to follow London cultural norms (whatever those are)”, versus
An AGI with the motivation “I want to follow the following 500 rules (avoid public nudity, speak English, don’t lick strangers, …), which by the way comprise London cultural norms as I understand them”
Normally I think of “value learning” (or in this case, “norm learning”) as related to the second bullet point—i.e., the AI watches one or more people and learn their actual preferences and desires. I also had the impression that your OP was along the lines of the second (not first) bullet point.
If that’s right, and if we figure out how to make an agent with the first-bullet-point motivation, then I wouldn’t say that “the value learning problem is already solved”, instead I would say that we have made great progress towards safe & beneficial AGI in a way that does not involve “solving value learning”. Instead the agent will hopefully go ahead and solve value learning all by itself.
(I’m not confident that my definitions here are standard or correct, and I’m certainly oversimplifying in various ways.)
Not really a fair characterization I think: 2 mostly seems orthogonal to me (though I probably disagree with your claim. i.e. most important things are passed from previous generations. e.g. children learn that theft is bad, racism is bad etc, all of those things are passed from either parents or other adults. I don’t care a lot about the distinction parents vs other adults/society in this case. I know about the research that parenting has little influence, I don’t want to go into it preferably). 1 seems more relevant. In fact maybe the main reason for me to think this post is irrelevant is that the inductive biases in AI systems will be too different from that of humans (although note, genes still alow for a lot of variability in ethics and so on). But I still think it might be a good idea to keep in mind that “information in the brain about values has a higher risk to not get communicated into the training signal if the method of elliciting that information is not adapted to the way humans normally express the information”, if indeed it is true.
If a kid’s parents and teachers and other authority figures tell them that stealing is bad, while everyone in the kid’s peer group (and the next few grades up) steal all the time, and they never get in trouble, and they talk endlessly about how awesome it is, I think there’s a very good chance that the kid will wind up feeling that stealing is great, just make sure the adults don’t find out.
I speak from personal experience! As a kid, I used the original Napster to illegally download music. My parents categorized illegal music downloads as a type of theft, and therefore terribly unethical. So I did it without telling them. :-P
As a more mundane example, I recall that my parents and everyone in their generation thought that clothes should fit on your body, while my friends in middle school thought that clothes should be much much too large for your body. You can guess what size clothing I desperately wanted to wear.
(I think there’s some variation from kid to kid. Certainly some kids at some ages look up to their parents and feel motivated to be like them.)
In my mind, “different inductive bias” is less important here than “different reward functions”. (Details.) For example, high-functioning psychopaths are perfectly capable of understanding and imitating the cultural norms that they grew up in. They just don’t want to.
I agree that cultures exist and are not identical.
I tend to think that learning and following the norms of a particular culture (further discussion) isn’t too hard a problem for an AGI which is motivated to do so, and hence I think of that motivation as the hard part. By contrast, I think I’m much more open-minded than you to the idea that there might be lots of ways to do the actual cultural learning. For example, the “natural” way for humans to learn Bedouin culture is to grow up as a Bedouin. But I think it’s fair to say that humans can also learn Bedouin culture quite well by growing up in a different culture and then moving into a Bedouin culture as an adult. And I think humans can even (to a lesser-but-still-significant extent) learn Bedouin culture by reading about it and watching YouTube videos etc.
“I tend to think that learning and following the norms of a particular culture (further discussion) isn’t too hard a problem for an AGI which is motivated to do so”. If the AGI is motivated to do so then the value learning problem is already solved and nothing else matters (in particular my post becomes irrelevant), because indeed it can learn the further details in whichever way it wants. We somehow already managed to create an agent with an internal objective that points to Bedouin culture (human values), which is the whole/complete problem.
I could say more about the rest of your comment but just checking if the above changes your model of my model significantly?
Also regarding “I think I’m much more open-minded than you to …”: to be clear, I’m not at all convinced about this I’m open to this distinction not mattering at all. I hope I didn’t come accross as not open minded about this.
There’s sorta a use/mention distinction between:
An AGI with the motivation “I want to follow London cultural norms (whatever those are)”, versus
An AGI with the motivation “I want to follow the following 500 rules (avoid public nudity, speak English, don’t lick strangers, …), which by the way comprise London cultural norms as I understand them”
Normally I think of “value learning” (or in this case, “norm learning”) as related to the second bullet point—i.e., the AI watches one or more people and learn their actual preferences and desires. I also had the impression that your OP was along the lines of the second (not first) bullet point.
If that’s right, and if we figure out how to make an agent with the first-bullet-point motivation, then I wouldn’t say that “the value learning problem is already solved”, instead I would say that we have made great progress towards safe & beneficial AGI in a way that does not involve “solving value learning”. Instead the agent will hopefully go ahead and solve value learning all by itself.
(I’m not confident that my definitions here are standard or correct, and I’m certainly oversimplifying in various ways.)