Disagreements related to what we value seem to explain maybe 10% of the
disagreements over AI safety. This post will try to explain how I think
about which values I care about perpetuating to the distant future.
The key hard question here is this: what aspects of the causal
influences that lead to you do you now embrace, and which do you
instead reject as “random” errors that you want to cut out? Consider
two extremes.
At one extreme, one could endorse absolutely every random element that
contributed to any prior choice or intuition.
...
At the other extreme, you might see yourself as primarily the result
of natural selection, both of genes and of memes, and see your core
non-random value as that of doing the best you can to continue to
“win” at that game. … In this view, everything about you that won’t
help your descendants be selected in the long run is a random error
that you want to detect and reject.
In other words, the more unique criteria we have about what we want to
preserve into the distant future, the less we should expect to succeed.
Do Humans Have Many Terminal Values?
The Elephant in the Brain shows that
we portray ourselves as having a wide variety of noble values, a large
fraction of which are better explained by simple selfishness.
Self-oriented goals related to wealth, prestige, dominance, safety, and
health can be largely explained as results of us valuing our own
empowerment. The post Empowerment is (almost) All We
Need
expands on this idea.
Shard
Theory
tells us that most human values are context-dependent.
Those considerations lead me to conclude that most human values are
better explained as instrumental values (i.e. subgoals) rather than as
terminal values.
But before I go overboard in that direction, I need to remind myself
that evolution produces messy results that can’t be classified as
simply as I’d like. E.g. sex. Evolution intended sex to be an
instrumental value that helps to spread genes.
Evolution implemented sex drives in a way that caused humans to often
treat sex more as a terminal goal. Is sex a genuinely terminal goal? Or
is it a goal that’s contingent on certain hormones and sensory organs?
Like most of my values, the answer to this question seems
context-dependent. Transhumanist culture seems to guide me toward
treating sex as an instrumental value. Whereas if I lived in a
lower-tech sex-positive culture, I’d treat sex drives as sufficiently
inevitable that I couldn’t distinguish sex from a terminal goal.
Human values haven’t evolved to cleanly fit into a framework which says
that instrumental and terminal values are clean different categories.
So there is some complexity to human terminal values. But I don’t see
how those values differ much from those of simpler animals. I claim that
virtually all values that differ from those of other species should be
classified as instrumental values.
future human-level AIs are not co-existing competing aliens; they
are instead literally our descendants. So if your evolved instincts
tell you to fight your descendants due to their strangeness, that is a
huge evolutionary mistake.
I’m bothered by the binary language that leads both Robin and the AI
doomers to sound as if AIs will be either 100% or 0% our descendants.
I care how much of my genes (loosely defined so as to include culture)
are embodied in our AI successors.
If I believed Eliezer’s model of how AI will function, I’d be almost
as pessimistic as he is about AIs sharing my “genes”. My model of AI
is a bit closer to Robin’s model. My best guess is that at least some
AIs will be sufficiently human-like that they preserve some of what I
value.
But in addition to that factual disagreement about how much AIs will
resemble me, there’s an important value difference about how we should
feel about AIs that perpetuate modest fractions of our “genes”.
Robin is expressing values that roughly correspond to what we should
expect from evolved minds. But he sometimes sounds like a more
altruistic utilitarian than is consistent with evolutionary values. (He
sounds more worried about alien values when talking about the risk that
the world will be dominated by Amish and Orthodox
Jews.)
Eliezer seems to be expressing values that look more like the result of
goal displacement (subgoal
stomp), in which
preservation of identity / values (presumably originating as a subgoal
of empowerment or selective fitness) becomes a terminal goal.
That doesn’t mean Eliezer’s values are wrong. His values are harder to
satisfy than are Robin’s. So to the extent my preferences about
descendants are subgoals that are amenable to change, I’ll prefer to
nudge them a bit in Robin’s direction. In particular, I’ll try to be a
bit more accepting of uploads that imperfectly preserve my identity.
I’m not too clear about the extent to which my values about descendants
are subgoals or terminal goals. But since I see a pervasive human bias
to overstate the extent to which our goals are terminal goals, I’m
going to try to think about my values a bit more like Robin thinks, and
a bit less like Eliezer thinks.
Uncertainty
Here are two analogies for why I’m concerned about aligning AI:
Mothers are quite selective about whose genes fertilize their eggs.
Fathers are often concerned about the paternity of the children that
they’re supporting.
Robin sometimes sounds as if he’s rejecting these instincts.
Current AI efforts are on track to produce a somewhat arbitrary mix of
whatever values are easiest to implement, and the values of the most
influential AI developers.
To the extent AI values are chosen by ease of implementation, I reject
the notion that those AIs are our descendants.
To the extent AI values replicate the culture of influential people, I
feel a more normal mix of approval that people like me are reproducing,
and concern about whether those descendants embody too much of other
people’s values and not enough of mine.
I’d prefer an Age of Em to whatever it is we’re
currently on track for. Alas, I have no clear plan for a future in which
ems are more important than are more artificial AIs.
Disentangling Our Terminal and Instrumental Values
Link post
Disagreements related to what we value seem to explain maybe 10% of the disagreements over AI safety. This post will try to explain how I think about which values I care about perpetuating to the distant future.
Robin Hanson helped to clarify the choices in Which Of Your Origins Are You?:
In other words, the more unique criteria we have about what we want to preserve into the distant future, the less we should expect to succeed.
Do Humans Have Many Terminal Values?
The Elephant in the Brain shows that we portray ourselves as having a wide variety of noble values, a large fraction of which are better explained by simple selfishness.
Self-oriented goals related to wealth, prestige, dominance, safety, and health can be largely explained as results of us valuing our own empowerment. The post Empowerment is (almost) All We Need expands on this idea.
Shard Theory tells us that most human values are context-dependent.
Those considerations lead me to conclude that most human values are better explained as instrumental values (i.e. subgoals) rather than as terminal values.
But before I go overboard in that direction, I need to remind myself that evolution produces messy results that can’t be classified as simply as I’d like. E.g. sex. Evolution intended sex to be an instrumental value that helps to spread genes.
Evolution implemented sex drives in a way that caused humans to often treat sex more as a terminal goal. Is sex a genuinely terminal goal? Or is it a goal that’s contingent on certain hormones and sensory organs? Like most of my values, the answer to this question seems context-dependent. Transhumanist culture seems to guide me toward treating sex as an instrumental value. Whereas if I lived in a lower-tech sex-positive culture, I’d treat sex drives as sufficiently inevitable that I couldn’t distinguish sex from a terminal goal.
Human values haven’t evolved to cleanly fit into a framework which says that instrumental and terminal values are clean different categories.
So there is some complexity to human terminal values. But I don’t see how those values differ much from those of simpler animals. I claim that virtually all values that differ from those of other species should be classified as instrumental values.
Identity
Robin says in AIs Will Be Our Mind Children:
I’m bothered by the binary language that leads both Robin and the AI doomers to sound as if AIs will be either 100% or 0% our descendants.
I care how much of my genes (loosely defined so as to include culture) are embodied in our AI successors.
If I believed Eliezer’s model of how AI will function, I’d be almost as pessimistic as he is about AIs sharing my “genes”. My model of AI is a bit closer to Robin’s model. My best guess is that at least some AIs will be sufficiently human-like that they preserve some of what I value.
But in addition to that factual disagreement about how much AIs will resemble me, there’s an important value difference about how we should feel about AIs that perpetuate modest fractions of our “genes”.
Robin is expressing values that roughly correspond to what we should expect from evolved minds. But he sometimes sounds like a more altruistic utilitarian than is consistent with evolutionary values. (He sounds more worried about alien values when talking about the risk that the world will be dominated by Amish and Orthodox Jews.)
Eliezer seems to be expressing values that look more like the result of goal displacement (subgoal stomp), in which preservation of identity / values (presumably originating as a subgoal of empowerment or selective fitness) becomes a terminal goal.
That doesn’t mean Eliezer’s values are wrong. His values are harder to satisfy than are Robin’s. So to the extent my preferences about descendants are subgoals that are amenable to change, I’ll prefer to nudge them a bit in Robin’s direction. In particular, I’ll try to be a bit more accepting of uploads that imperfectly preserve my identity.
I’m not too clear about the extent to which my values about descendants are subgoals or terminal goals. But since I see a pervasive human bias to overstate the extent to which our goals are terminal goals, I’m going to try to think about my values a bit more like Robin thinks, and a bit less like Eliezer thinks.
Uncertainty
Here are two analogies for why I’m concerned about aligning AI:
Mothers are quite selective about whose genes fertilize their eggs.
Fathers are often concerned about the paternity of the children that they’re supporting.
Robin sometimes sounds as if he’s rejecting these instincts.
Current AI efforts are on track to produce a somewhat arbitrary mix of whatever values are easiest to implement, and the values of the most influential AI developers.
To the extent AI values are chosen by ease of implementation, I reject the notion that those AIs are our descendants.
To the extent AI values replicate the culture of influential people, I feel a more normal mix of approval that people like me are reproducing, and concern about whether those descendants embody too much of other people’s values and not enough of mine.
I’d prefer an Age of Em to whatever it is we’re currently on track for. Alas, I have no clear plan for a future in which ems are more important than are more artificial AIs.