Steven Byrnes comments on Natural Value Learning

Steven Byrnes 21 Mar 2022 18:31 UTC
2 points
There’s sorta a use/mention distinction between:
- An AGI with the motivation “I want to follow London cultural norms (whatever those are)”, versus
- An AGI with the motivation “I want to follow the following 500 rules (avoid public nudity, speak English, don’t lick strangers, …), which by the way comprise London cultural norms as I understand them”
Normally I think of “value learning” (or in this case, “norm learning”) as related to the second bullet point—i.e., the AI watches one or more people and learn their actual preferences and desires. I also had the impression that your OP was along the lines of the second (not first) bullet point.
If that’s right, and if we figure out how to make an agent with the first-bullet-point motivation, then I wouldn’t say that “the value learning problem is already solved”, instead I would say that we have made great progress towards safe & beneficial AGI in a way that does not involve “solving value learning”. Instead the agent will hopefully go ahead and solve value learning all by itself.
(I’m not confident that my definitions here are standard or correct, and I’m certainly oversimplifying in various ways.)