This led me to think… why do we even believe that human values are good? Perhaps the typical human behaviour amplified by possibilities of a super-intelligence would actually destroy the universe. I don’t personally find this very likely (that’s why I never posted it before), but, given that almost all AI safety is built around “how to check that AI’s values are convergent with human values” one way or another, perhaps something else should be approached—like remodeling history (actual, human history) from a given starting point (say, Roman Principatus or 1945) with actors assigned values different from human values (but in similar relationship to each other, if applicable) and finding what leads to better results (and, in particular, in us not being destroyed by 2020). All with the usual sandbox precautions, of course.
(Addendum: Of course, pace “fragility of value”. We should have some inheritance from metamorals. But we don’t actually know how well our morals (and systems in “reliable inheritance” from them) are compatible with our metamorals, especially in an extreme environment such as superintelligence.)
why do we even believe that human values are good?
Because they constitute, by definition, our goodness criterion? It’s not like we have two separate modules—one for “human values”, and one for “is this good?”. (ETA or are you pointing out how our values might shift over time as we reflect on our meta-ethics?)
Perhaps the typical human behaviour amplified by possibilities of a super-intelligence would actually destroy the universe.
If I understand correctly, this is “are human behaviors catastrophic?”—not “are human values catastrophic?”.
On the latter: yes, this is part of the question but not the whole question. See addendum.
On the former: technically not true. If we take “human values” as “values averaged between different humans” (not necessarily by arithmetical mean, of course) they may be vastly different from “is this good from my viewpoint?”.
On the bracketed part: yeah, that too. And our current morals may not be that good judging by our metamorals.
Again, I want to underscore that I mention this as a theoretical possibility not so improbable as to make it not worth considering—not as an unavoidable fact.
This led me to think… why do we even believe that human values are good? Perhaps the typical human behaviour amplified by possibilities of a super-intelligence would actually destroy the universe. I don’t personally find this very likely (that’s why I never posted it before), but, given that almost all AI safety is built around “how to check that AI’s values are convergent with human values” one way or another, perhaps something else should be approached—like remodeling history (actual, human history) from a given starting point (say, Roman Principatus or 1945) with actors assigned values different from human values (but in similar relationship to each other, if applicable) and finding what leads to better results (and, in particular, in us not being destroyed by 2020). All with the usual sandbox precautions, of course.
(Addendum: Of course, pace “fragility of value”. We should have some inheritance from metamorals. But we don’t actually know how well our morals (and systems in “reliable inheritance” from them) are compatible with our metamorals, especially in an extreme environment such as superintelligence.)
Because they constitute, by definition, our goodness criterion? It’s not like we have two separate modules—one for “human values”, and one for “is this good?”. (ETA or are you pointing out how our values might shift over time as we reflect on our meta-ethics?)
If I understand correctly, this is “are human behaviors catastrophic?”—not “are human values catastrophic?”.
On the latter: yes, this is part of the question but not the whole question. See addendum.
On the former: technically not true. If we take “human values” as “values averaged between different humans” (not necessarily by arithmetical mean, of course) they may be vastly different from “is this good from my viewpoint?”.
On the bracketed part: yeah, that too. And our current morals may not be that good judging by our metamorals.
Again, I want to underscore that I mention this as a theoretical possibility not so improbable as to make it not worth considering—not as an unavoidable fact.