So, human values are fragile, vague and possibly not even a well defined concept, yet figuring it out seems essential for an aligned AI. It seems reasonable that, faced with a hard problem, one would start instead with a simpler one that has some connection to the original problem. For someone not working in the area of ML or AI alignment, it seems obvious that researching simpler-than-human values might be a way to make progress. But maybe this is one of those false obvious ideas that non-experts tend to push after a cursory learning about a complex research topic.
That said, assuming that the value complexity scales with intelligence, studying less intelligent agents and their version of values maybe something to pursue. Dolphin values. Monkey values. Dog values. Cat values. Fish values. Amoeba values. Sure, we lose the inside view in this case, but the trade-off seems at least being worthy of exploring. Is there any research going in that area?
Yes. See:
Mammalian Value Systems
Gopal P. Sarma, Nick J. Hay(Submitted on 28 Jul 2016 (v1), last revised 21 Jan 2019 (this version, v4))
Thanks, that’s interesting! There isn’t a lot they do with the question, but at least they ask it.
There is a couple of followup articles by the authors, which could be found if you put the title of this article in the Google Scholar and look at the citations.
https://www.lesswrong.com/posts/cmrtpfG7hGEL9Zh9f/the-scourge-of-perverse-mindedness?commentId=jo7q3GqYFzhPWhaRA