Дмитрий Зеленский comments on The human side of interaction

Дмитрий Зеленский 21 Mar 2020 13:23 UTC
1 point
AF
This led me to think… why do we even believe that human values are good? Perhaps the typical human behaviour amplified by possibilities of a super-intelligence would actually destroy the universe. I don’t personally find this very likely (that’s why I never posted it before), but, given that almost all AI safety is built around “how to check that AI’s values are convergent with human values” one way or another, perhaps something else should be approached—like remodeling history (actual, human history) from a given starting point (say, Roman Principatus or 1945) with actors assigned values different from human values (but in similar relationship to each other, if applicable) and finding what leads to better results (and, in particular, in us not being destroyed by 2020). All with the usual sandbox precautions, of course.
(Addendum: Of course, pace “fragility of value”. We should have some inheritance from metamorals. But we don’t actually know how well our morals (and systems in “reliable inheritance” from them) are compatible with our metamorals, especially in an extreme environment such as superintelligence.)
- TurnTrout 21 Mar 2020 13:37 UTC
  LW: 5 AF: 3
  0
  AF Parent
  
  why do we even believe that human values are good?
  
  Because they constitute, by definition, our goodness criterion? It’s not like we have two separate modules—one for “human values”, and one for “is this good?”. (ETA or are you pointing out how our values might shift over time as we reflect on our meta-ethics?)
  
  Perhaps the typical human behaviour amplified by possibilities of a super-intelligence would actually destroy the universe.
  
  If I understand correctly, this is “are human behaviors catastrophic?”—not “are human values catastrophic?”.
  - Дмитрий Зеленский 22 Mar 2020 14:29 UTC
    1 point
    Parent
    On the latter: yes, this is part of the question but not the whole question. See addendum.
    On the former: technically not true. If we take “human values” as “values averaged between different humans” (not necessarily by arithmetical mean, of course) they may be vastly different from “is this good from my viewpoint?”.
    On the bracketed part: yeah, that too. And our current morals may not be that good judging by our metamorals.
    Again, I want to underscore that I mention this as a theoretical possibility not so improbable as to make it not worth considering—not as an unavoidable fact.