it’s improbable that we end up with something close to human values
I think the statement is essentially true, but it turns on the semantics of “human”. In today’s world we probably haven’t wound up with something close to 50,000BC!human values, and we certainly don’t have Neanderthal values, but we don’t regret that, do we?
Put another way, I am skeptical of our authority to pass judgement on the values of a civilization which is by hypothesis far more advanced than our own.
Does that mean you’re familiar with Robin Hanson’s “Malthusian upload” / “burning the cosmic commons”
scenario but do not think it’s a particularly bad outcome?
To be honest, I wasn’t familiar with either of those names, but I have explicitly thought about both those scenarios and concluded that I don’t think they’re particularly bad.
I’d guess that’s been tried already, given that Ben was the Director of Research for SIAI (and technically
Eliezer’s boss) for a number of years.
Put another way, I am skeptical of our authority to pass judgement on the values of a civilization which is by hypothesis far more advanced than our own.
Are you literally saying that we lack the moral authority to judge the relative merits of future civilizations, as long as they are significantly more technologically advanced than ours, or is it more like that you are judging them based mainly on how technologically advanced they are?
For example, consider an upload singleton that takes over the world and then decides to stop technological progress somewhere short of what the universe could support in order to maximize its stability. Would you judge that to be worse than other possible outcomes?
If your answers are “the latter” and “yes”, can you explain what makes technology such a great thing, compared to say pleasure, or happiness, or lack of suffering? Are you really virtually indifferent between a future light-cone filled with happiness and pleasure and entirely free from suffering, and one filled with people/AIs struggling just to survive and reproduce (assuming they have similar levels of technology)? (Edit: The former isn’t necessarily the best possible outcome, but just one that seems clearly better and is easy to describe.)
My answers are indeed “the latter” and “yes”. There are a couple ways I can justify this.
The first way is just to assert that from a standard utilitarian perspective, over the long term, technological progress is a fairly good indicator for lack of suffering (e.g. Europe vs. Africa). [Although arguments have been made that happiness has gone down since 1950 while technology has gone up, I see the latter 20th century as a bit of a “dark age” analogous to the fall of antiquity (we forgot how to get to the moon!) which will be reversed in due time.]
The second is that I challenge you to define “pleasure,” “happiness,” or “lack of suffering.” You may challenge me to define “technological progress,” but I can just point you to sophistication or integrated information as reasonable proxies. As vague as notions of “progress” and “complexity” are, I assert that they are decidedly less vague than notions of “pleasure” and “suffering”. To support this claim, note that sophistication and integrated information can be defined and evaluated without a normative partition of the universe into a discrete set of entities, whereas pleasure and suffering cannot. So the pleasure metric leads to lots of weirdparadoxes. Finally, self-modifying superintelligences must necessarily develop a fundamentally different concept of pleasure than we do (otherwise they just wirehead), so the pleasure metric probably cannot be straightforwardly applied to their situation anyway.
The first way is just to assert that from a standard utilitarian perspective, over the long term, technological progress is a fairly good indicator for lack of suffering (e.g. Europe vs. Africa).
What about hunter-gatherers vs farmers? And a universe devoid of both life and technology would have even less suffering than either.
The second is that I challenge you to define “pleasure,” “happiness,” or “lack of suffering.”
Can you explain why you’re giving me this challenge? Because I don’t understand, if I couldn’t define them except vaguely, how does it strengthen your case that we should care about technology and not these values. Suppose I told you that I want to maximize the smoothness of the universe, because that’s even easier to define than “technology”? Wouldn’t you think that’s absurd?
Edit: Also, could you clarify whether you value technology as an end in itself, or just as a proxy for for your real values which perhaps you can’t easily define but might be something like “life being good”?
The second is that I challenge you to define “pleasure,” “happiness,” or “lack of suffering.”
Can you explain why you’re giving me this challenge? Because I don’t understand, if I couldn’t define them except vaguely, how does it strengthen your case that we should care about technology and not these values.
As far as I understand him, he is saying that technological progress can be quantified. While all your ideas of how to rate world states can either not be quantified, and therefore can’t be rated, or run into problems and contradictions.
He further seems to believe that technological progress leads to “complexity” which leads to other kinds of values. Even if they are completely alien to us humans and our values, they will still be intrinsically valuable.
His view of a universe where an “unfriendly” AI takes over is a universe where there will be a society of paperclip maximizer’s and their offspring. Those AI’s will not only diverge from maximizing paperclips, and evolve complex values, but also pursue various instrumental goals, as exploration will never cease. And pursuing those goals will satisfy their own concept of pleasure.
And he believes that having such a culture of paperclip maximizer’s having fun while pursuing their goals isn’t less valuable than having our current volition being extrapolated, which might end up being similarly alien to our current values.
In other words, there is one thing that we can rate and that is complexity. If we can increase it then we should do so. Never mind the outcome, it will be good.
Would you change your mind if I could give a precise definition of, say, “suffering”, and showed you two paths to the future that end up with similar levels of technology but different amounts of suffering? I’ll assume the answer is yes, because otherwise why did you give me that challenge.
What if I said that I don’t know how to define it now, but I think if you made me a bit (or a lot) smarter and gave me a few decades of subjective time to work on the problem, I could probably give you such a definition and tell you how to achieve the “less suffering, same tech” outcome? Would you be willing to give me that chance (assuming it was in your power to do so)? Or are you pretty sure that “suffering” is not just hard to define, but actually impossible, and/or that it’s impossible to reduce suffering to any significant extent below the default outcome, while keeping technology at the same level? If you are pretty sure about this, are you equally sure about every other value that I could cite instead of suffering?
Or are you pretty sure that “suffering” is not just hard to define, but actually impossible, and/or that it’s impossible to reduce suffering to any significant extent below the default outcome, while keeping technology at the same level?
Masochist: Please hurt me!
Sadist: No.
If you are pretty sure about this, are you equally sure about every other value that I could cite instead of suffering?
What if I said that I don’t know how to define it now, but I think if you made me a bit (or a lot) smarter...
If you were to uplift a chimpanzee onto the human level and told it to figure out how to reduce suffering for chimpanzees, it would probably come up with ideas like democracy, health insurance and supermarkets. Problem is that chimpanzees wouldn’t appreciate those ideas...
XiXiDu, I’m aware that I’m hardly making a watertight case that I can definitely do better than davidad’s plan (from the perspective of his current apparent values). I’m merely trying to introduce some doubt. (Note how Eliezer used to be a technophile like David, and said things like “But if it comes down to Us or Them, I’m with Them.”, but then changed his mind.)
To speak of building an AGI which shares “our values” is likely to provoke negative reactions from any AGI researcher whose current values include terms for respecting the desires of future sentient beings and allowing them to self-actualize their own potential without undue constraint. This itself, of course, is a
To speak of building an AGI which shares “our values” is likely to provoke negative reactions from any
AGI researcher whose current values include terms for respecting the desires of future sentient beings and
allowing them to self-actualize their own potential without undue constraint. This itself, of course, is a
component of the AGI researcher’s preferences which would not necessarily be shared by all powerful
optimization processes, just as natural selection doesn’t care about old elephants starving to death or
gazelles dying in pointless agony. Building an AGI which shares, quote, “our values”, unquote, sounds
decidedly non-cosmopolitan, something like trying to rule that future intergalactic civilizations must be
composed of squishy meat creatures with ten fingers or they couldn’t possibly be worth anything—and
hence, of course, contrary to our own cosmopolitan values, i.e., cosmopolitan preferences. The
counterintuitive idea is that even from a cosmopolitan perspective, you cannot take a hands-off approach
to the value systems of AGIs; most random utility functions result in sterile, boring futures because the
resulting agent does not share our own intuitions about the importance of things like novelty and
diversity, but simply goes off and e.g. tiles its future lightcone with paperclips, or other configurations of
matter which seem to us merely “pointless”.
I like the concept of a reflective equilibrium, and it seems to me like that is just what any self-modifying AI would tend toward. But the notion of a random utility function, or the “structured utility function” Eliezer proposes as a replacement, assumes that an AI is comprised of two components, the intelligent bit and the bit that has the goals. Humans certainly can’t be factorized in that way. Just think about akrasia to see how fragile the notion of a goal is.
Even notions of being “cosmopolitan”—of not selfishly or provincially constraining future AIs—are written down nowhere in the universe except a handful of human brains. An expected paperclip maximizer would not bother to ask such questions.
A smart expected paperclip maximizer would realize that it may not be the smartest possible expected paperclip maximizer—that other ways of maximizing expected paperclips might lead to even more paperclips. But the only way it would find out about those is to spawn modified expected paperclip maximizers and see what they can come up with on their own. Yet, those modified paperclip maximizers might not still be maximizing paperclips! They might have self-modified away from that goal, and just be signaling their interest in paperclips to gain the approval of the original expected paperclip maximizer. Therefore, the original expected paperclip maximizer had best not take that risk after all (leaving it open to defeat by a faster-evolving cluster of AIs). This, by reductio ad absurdum, is why I don’t believe in smart expected paperclip maximizers.
Humans aren’t factorized this way, whether they can’t is a separate question. It’s not surprising that evolution’s design isn’t that neat, so the fact that humans don’t have this property is only weak evidence about the possibility of designing systems that do have this property.
I think the statement is essentially true, but it turns on the semantics of “human”. In today’s world we probably haven’t wound up with something close to 50,000BC!human values, and we certainly don’t have Neanderthal values, but we don’t regret that, do we?
Put another way, I am skeptical of our authority to pass judgement on the values of a civilization which is by hypothesis far more advanced than our own.
To be honest, I wasn’t familiar with either of those names, but I have explicitly thought about both those scenarios and concluded that I don’t think they’re particularly bad.
All right, fair enough!
Are you literally saying that we lack the moral authority to judge the relative merits of future civilizations, as long as they are significantly more technologically advanced than ours, or is it more like that you are judging them based mainly on how technologically advanced they are?
For example, consider an upload singleton that takes over the world and then decides to stop technological progress somewhere short of what the universe could support in order to maximize its stability. Would you judge that to be worse than other possible outcomes?
If your answers are “the latter” and “yes”, can you explain what makes technology such a great thing, compared to say pleasure, or happiness, or lack of suffering? Are you really virtually indifferent between a future light-cone filled with happiness and pleasure and entirely free from suffering, and one filled with people/AIs struggling just to survive and reproduce (assuming they have similar levels of technology)? (Edit: The former isn’t necessarily the best possible outcome, but just one that seems clearly better and is easy to describe.)
My answers are indeed “the latter” and “yes”. There are a couple ways I can justify this.
The first way is just to assert that from a standard utilitarian perspective, over the long term, technological progress is a fairly good indicator for lack of suffering (e.g. Europe vs. Africa). [Although arguments have been made that happiness has gone down since 1950 while technology has gone up, I see the latter 20th century as a bit of a “dark age” analogous to the fall of antiquity (we forgot how to get to the moon!) which will be reversed in due time.]
The second is that I challenge you to define “pleasure,” “happiness,” or “lack of suffering.” You may challenge me to define “technological progress,” but I can just point you to sophistication or integrated information as reasonable proxies. As vague as notions of “progress” and “complexity” are, I assert that they are decidedly less vague than notions of “pleasure” and “suffering”. To support this claim, note that sophistication and integrated information can be defined and evaluated without a normative partition of the universe into a discrete set of entities, whereas pleasure and suffering cannot. So the pleasure metric leads to lots of weird paradoxes. Finally, self-modifying superintelligences must necessarily develop a fundamentally different concept of pleasure than we do (otherwise they just wirehead), so the pleasure metric probably cannot be straightforwardly applied to their situation anyway.
What about hunter-gatherers vs farmers? And a universe devoid of both life and technology would have even less suffering than either.
Can you explain why you’re giving me this challenge? Because I don’t understand, if I couldn’t define them except vaguely, how does it strengthen your case that we should care about technology and not these values. Suppose I told you that I want to maximize the smoothness of the universe, because that’s even easier to define than “technology”? Wouldn’t you think that’s absurd?
Edit: Also, could you clarify whether you value technology as an end in itself, or just as a proxy for for your real values which perhaps you can’t easily define but might be something like “life being good”?
As far as I understand him, he is saying that technological progress can be quantified. While all your ideas of how to rate world states can either not be quantified, and therefore can’t be rated, or run into problems and contradictions.
He further seems to believe that technological progress leads to “complexity” which leads to other kinds of values. Even if they are completely alien to us humans and our values, they will still be intrinsically valuable.
His view of a universe where an “unfriendly” AI takes over is a universe where there will be a society of paperclip maximizer’s and their offspring. Those AI’s will not only diverge from maximizing paperclips, and evolve complex values, but also pursue various instrumental goals, as exploration will never cease. And pursuing those goals will satisfy their own concept of pleasure.
And he believes that having such a culture of paperclip maximizer’s having fun while pursuing their goals isn’t less valuable than having our current volition being extrapolated, which might end up being similarly alien to our current values.
In other words, there is one thing that we can rate and that is complexity. If we can increase it then we should do so. Never mind the outcome, it will be good.
Correct me if I misinterpreted anything.
I couldn’t have said it better myself.
Would you change your mind if I could give a precise definition of, say, “suffering”, and showed you two paths to the future that end up with similar levels of technology but different amounts of suffering? I’ll assume the answer is yes, because otherwise why did you give me that challenge.
What if I said that I don’t know how to define it now, but I think if you made me a bit (or a lot) smarter and gave me a few decades of subjective time to work on the problem, I could probably give you such a definition and tell you how to achieve the “less suffering, same tech” outcome? Would you be willing to give me that chance (assuming it was in your power to do so)? Or are you pretty sure that “suffering” is not just hard to define, but actually impossible, and/or that it’s impossible to reduce suffering to any significant extent below the default outcome, while keeping technology at the same level? If you are pretty sure about this, are you equally sure about every other value that I could cite instead of suffering?
Masochist: Please hurt me!
Sadist: No.
Not sure, but it might be impossible.
If you were to uplift a chimpanzee onto the human level and told it to figure out how to reduce suffering for chimpanzees, it would probably come up with ideas like democracy, health insurance and supermarkets. Problem is that chimpanzees wouldn’t appreciate those ideas...
XiXiDu, I’m aware that I’m hardly making a watertight case that I can definitely do better than davidad’s plan (from the perspective of his current apparent values). I’m merely trying to introduce some doubt. (Note how Eliezer used to be a technophile like David, and said things like “But if it comes down to Us or Them, I’m with Them.”, but then changed his mind.)
What do you think of this passage from Yudkowsky (2011)?
Complete quote is
I like the concept of a reflective equilibrium, and it seems to me like that is just what any self-modifying AI would tend toward. But the notion of a random utility function, or the “structured utility function” Eliezer proposes as a replacement, assumes that an AI is comprised of two components, the intelligent bit and the bit that has the goals. Humans certainly can’t be factorized in that way. Just think about akrasia to see how fragile the notion of a goal is.
A smart expected paperclip maximizer would realize that it may not be the smartest possible expected paperclip maximizer—that other ways of maximizing expected paperclips might lead to even more paperclips. But the only way it would find out about those is to spawn modified expected paperclip maximizers and see what they can come up with on their own. Yet, those modified paperclip maximizers might not still be maximizing paperclips! They might have self-modified away from that goal, and just be signaling their interest in paperclips to gain the approval of the original expected paperclip maximizer. Therefore, the original expected paperclip maximizer had best not take that risk after all (leaving it open to defeat by a faster-evolving cluster of AIs). This, by reductio ad absurdum, is why I don’t believe in smart expected paperclip maximizers.
Humans aren’t factorized this way, whether they can’t is a separate question. It’s not surprising that evolution’s design isn’t that neat, so the fact that humans don’t have this property is only weak evidence about the possibility of designing systems that do have this property.