[Question] Since figuring out human values is hard, what about, say, monkey values?

Shmi1 Jan 2020 21:56 UTC

37 points

So, human values are fragile, vague and possibly not even a well defined concept, yet figuring it out seems essential for an aligned AI. It seems reasonable that, faced with a hard problem, one would start instead with a simpler one that has some connection to the original problem. For someone not working in the area of ML or AI alignment, it seems obvious that researching simpler-than-human values might be a way to make progress. But maybe this is one of those false obvious ideas that non-experts tend to push after a cursory learning about a complex research topic.

That said, assuming that the value complexity scales with intelligence, studying less intelligent agents and their version of values maybe something to pursue. Dolphin values. Monkey values. Dog values. Cat values. Fish values. Amoeba values. Sure, we lose the inside view in this case, but the trade-off seems at least being worthy of exploring. Is there any research going in that area?

What links here?

Shmi's comment on How concerned are you about LW reputation management? by DirectedEvolution (17 May 2021 21:18 UTC; 4 points)

Shmi1 Jan 2020 21:56 UTC

37 points

13 comments1 min readLW link

Value Learning

avturchin 2 Jan 2020 0:03 UTC
23 points
Yes. See:
Mammalian Value Systems
Gopal P. Sarma, Nick J. Hay(Submitted on 28 Jul 2016 (v1), last revised 21 Jan 2019 (this version, v4))
Characterizing human values is a topic deeply interwoven with the sciences, humanities, art, and many other human endeavors. In recent years, a number of thinkers have argued that accelerating trends in computer science, cognitive science, and related disciplines foreshadow the creation of intelligent machines which meet and ultimately surpass the cognitive abilities of human beings, thereby entangling an understanding of human values with future technological development. Contemporary research accomplishments suggest sophisticated AI systems becoming widespread and responsible for managing many aspects of the modern world, from preemptively planning users’ travel schedules and logistics, to fully autonomous vehicles, to domestic robots assisting in daily living. The extrapolation of these trends has been most forcefully described in the context of a hypothetical “intelligence explosion,” in which the capabilities of an intelligent software agent would rapidly increase due to the presence of feedback loops unavailable to biological organisms. The possibility of superintelligent agents, or simply the widespread deployment of sophisticated, autonomous AI systems, highlights an important theoretical problem: the need to separate the cognitive and rational capacities of an agent from the fundamental goal structure, or value system, which constrains and guides the agent’s actions. The “value alignment problem” is to specify a goal structure for autonomous agents compatible with human values. In this brief article, we suggest that recent ideas from affective neuroscience and related disciplines aimed at characterizing neurological and behavioral universals in the mammalian class provide important conceptual foundations relevant to describing human values. We argue that the notion of “mammalian value systems” points to a potential avenue for fundamental research in AI safety and AI ethics.
- Shmi 2 Jan 2020 2:35 UTC
  3 points
  Parent
  Thanks, that’s interesting! There isn’t a lot they do with the question, but at least they ask it.
  - avturchin 2 Jan 2020 11:03 UTC
    4 points
    Parent
    There is a couple of followup articles by the authors, which could be found if you put the title of this article in the Google Scholar and look at the citations.
Capybasilisk 3 Jan 2020 11:14 UTC
8 points

Consider the trilobites. If there had been a trilobite-Friendly AI using CEV, invincible articulated shells would comb carpets of wet muck with the highest nutrient density possible within the laws of physics, across worlds orbiting every star in the sky. If there had been a trilobite-engineered AI going by 100% satisfaction of all historical trilobites, then trilobites would live long, healthy lives in a safe environment of adequate size, and the cambrian explosion (or something like it) would have proceeded without them.

https://www.lesswrong.com/posts/cmrtpfG7hGEL9Zh9f/the-scourge-of-perverse-mindedness?commentId=jo7q3GqYFzhPWhaRA

leggi 3 Jan 2020 12:24 UTC
6 points
My background: I’ve spent a lot more time with animals than rational humans.
To point me in the right direction before trying to write more—I would like a definition for “values” to work with.
An instant find via images - human values. Anywhere near?!
Gordon Seidoh Worley 2 Jan 2020 22:38 UTC
4 points
I think an interesting line of research would be to start with simpler animals and move towards more complex ones to look for evidence of valence-based values. Since we’re looking for neurological feedback mechanisms and signs that control signals in those systems are what creates valence and interacts to ultimately create values, starting with brains with fewer neurons seems like a smart approach.
- Shmi 3 Jan 2020 2:08 UTC
  2 points
  Parent
  Right, it makes sense to me, just surprised it’s not being actively pursued.
  - Gordon Seidoh Worley 3 Jan 2020 19:09 UTC
    4 points
    Parent
    If I could get my hands on some money, I’d very much like to see it channeled into neuroscience research to better scan brains so we can identify the mechanisms at work, although I’m not sure how much differential impact this would have, as my outsider impression is that this kind of work is already happening, and at best I could only slightly nudge it towards directions I think are useful but would largely be bottlenecked by the need for more fundamental, broadly useful work in neuroscience to be done first.
    The other possibility would be to retrain myself as a neuroscientist so I could do this work myself, but I think that works against my comparative advantages at this point in my life.
Dagon 2 Jan 2020 0:49 UTC
2 points
Losing the inside view is a fundamental change in the question, and is a different definition of value than we usually use.
- Shmi 2 Jan 2020 2:34 UTC
  2 points
  Parent
  Indeed. And some exploration of this change in view and figuring out some invariant definition of value might be a good place to start.
steven0461 1 Jan 2020 23:25 UTC
1 point
Given that animals don’t act like expected utility maximizers, what do you mean when you talk about their values? For humans, you can ground a definition of “true values” in philosophical reflection (and reflection about how that reflection relates to their true values, and so on), but non-human animals can’t do philosophy.
- Shmi 2 Jan 2020 0:33 UTC
  6 points
  Parent
  Given that animals don’t act like expected utility maximizers
  Why do you think so? Higher animals behave very much like humans.
  non-human animals can’t do philosophy
  I would not consider it a disadvantage, or to mean that they don’t have a version of values.
- TAG 1 Jan 2020 23:53 UTC
  1 point
  Parent
  Depending how you define EU maximisation, everything us doing it, nothing is doing it, and many points in between.

[Question] Since figuring out human values is hard, what about, say, monkey values?

Mammalian Value Systems