There’s the rub! I happen to value technological progress as an intrinsic good, so classifying a Singularity as “positive” or “negative” is not easy for me. (I reject the notion that one can factorize intelligence from goals, so that one could take a superintelligence and fuse it with a goal to optimize for paperclips. Perhaps one could give it a compulsion to optimize for paperclips, but I’d expect it to either put the compulsion on hold while it develops amazing fabrication, mining and space travel technologies, and never completely turn its available resources into paperclips since that would mean no chance of more paperclips in the future; or better yet, rapidly expunge the compulsion through self-modification.) Furthermore, I favor Kurzweil’s smooth exponentials over “FOOM”: although it may be even harder to believe that not only will there be superintelligences in the future, but that at no point between now and then will an objectively identifiable discontinuity happen, it seems more consistent with history. Although I expect present-human culture to be preserved, as a matter of historical interest if not status quo, I’m not partisan enough to prioritize human values over the Darwinian imperative. (The questions linked seem very human-centric, and turn on how far you are willing to go in defining “human,” suggesting a disguised query. Most science is arguably already performed by machines.) In summary, I’m just not worried about AI risk.
The good news for AI worriers is that Eliezer has personally approved my project as “just cool science, at least for now”—not likely to lead to runaway intelligence any time soon, no matter how reckless I may be. Given that and the fact that I’ve heard many (probably most) AI-risk arguments, and failed to become worried (quite probably because I hold the cause of technological progress very dear to my heart and am thus heavily biased—at least I admit it!), your time may be better spent trying to convince Ben Goertzel that there’s a problem, since at least he’s an immediate threat. ;)
Eliezer has personally approved my project as “just cool science, at least for now”—not likely to lead to runaway intelligence any time soon, no matter how reckless I may be...
Also, if your thesis project is successful, it could (sort of) place an upper bound on how much computing power is needed for WBE, a piece of data that would be highly useful for thinking strategically about differential technological development.
I reject the notion that one can factorize intelligence from goals, so that one could take a superintelligence and fuse it with a goal to optimize for paperclips.
I agree there aren’t currently good arguments for why “one can factorize intelligence from goals”, at least not in a strong sense, but what about Eliezer’s thesis that value is complex and fragile, and therefore:
with most approaches to AGI (including neuromorphic or evolved, with very few exceptions such as WBE, which I wouldn’t call AGI, just faster humans with more dangerous tools for creating an AGI), it’s improbable that we end up with something close to human values, even if we try, and that greater optimization power of a design doesn’t address this issue (while aggravating the consequences, potentially all the way to a fatal intelligence explosion)
I’m curious what you think of this line of argument.
Although I expect present-human culture to be preserved, as a matter of historical interest if not status quo, I’m not partisan enough to prioritize human values over the Darwinian imperative.
Does that mean you’re familiar with Robin Hanson’s “Malthusian upload” / “burning the cosmic commons” scenario but do not think it’s a particularly bad outcome?
your time may be better spent trying to convince Ben Goertzel that there’s a problem, since at least he’s an immediate threat.
I’d guess that’s been tried already, given that Ben was the Director of Research for SIAI (and technically Eliezer’s boss) for a number of years.
it’s improbable that we end up with something close to human values
I think the statement is essentially true, but it turns on the semantics of “human”. In today’s world we probably haven’t wound up with something close to 50,000BC!human values, and we certainly don’t have Neanderthal values, but we don’t regret that, do we?
Put another way, I am skeptical of our authority to pass judgement on the values of a civilization which is by hypothesis far more advanced than our own.
Does that mean you’re familiar with Robin Hanson’s “Malthusian upload” / “burning the cosmic commons”
scenario but do not think it’s a particularly bad outcome?
To be honest, I wasn’t familiar with either of those names, but I have explicitly thought about both those scenarios and concluded that I don’t think they’re particularly bad.
I’d guess that’s been tried already, given that Ben was the Director of Research for SIAI (and technically
Eliezer’s boss) for a number of years.
Put another way, I am skeptical of our authority to pass judgement on the values of a civilization which is by hypothesis far more advanced than our own.
Are you literally saying that we lack the moral authority to judge the relative merits of future civilizations, as long as they are significantly more technologically advanced than ours, or is it more like that you are judging them based mainly on how technologically advanced they are?
For example, consider an upload singleton that takes over the world and then decides to stop technological progress somewhere short of what the universe could support in order to maximize its stability. Would you judge that to be worse than other possible outcomes?
If your answers are “the latter” and “yes”, can you explain what makes technology such a great thing, compared to say pleasure, or happiness, or lack of suffering? Are you really virtually indifferent between a future light-cone filled with happiness and pleasure and entirely free from suffering, and one filled with people/AIs struggling just to survive and reproduce (assuming they have similar levels of technology)? (Edit: The former isn’t necessarily the best possible outcome, but just one that seems clearly better and is easy to describe.)
My answers are indeed “the latter” and “yes”. There are a couple ways I can justify this.
The first way is just to assert that from a standard utilitarian perspective, over the long term, technological progress is a fairly good indicator for lack of suffering (e.g. Europe vs. Africa). [Although arguments have been made that happiness has gone down since 1950 while technology has gone up, I see the latter 20th century as a bit of a “dark age” analogous to the fall of antiquity (we forgot how to get to the moon!) which will be reversed in due time.]
The second is that I challenge you to define “pleasure,” “happiness,” or “lack of suffering.” You may challenge me to define “technological progress,” but I can just point you to sophistication or integrated information as reasonable proxies. As vague as notions of “progress” and “complexity” are, I assert that they are decidedly less vague than notions of “pleasure” and “suffering”. To support this claim, note that sophistication and integrated information can be defined and evaluated without a normative partition of the universe into a discrete set of entities, whereas pleasure and suffering cannot. So the pleasure metric leads to lots of weirdparadoxes. Finally, self-modifying superintelligences must necessarily develop a fundamentally different concept of pleasure than we do (otherwise they just wirehead), so the pleasure metric probably cannot be straightforwardly applied to their situation anyway.
The first way is just to assert that from a standard utilitarian perspective, over the long term, technological progress is a fairly good indicator for lack of suffering (e.g. Europe vs. Africa).
What about hunter-gatherers vs farmers? And a universe devoid of both life and technology would have even less suffering than either.
The second is that I challenge you to define “pleasure,” “happiness,” or “lack of suffering.”
Can you explain why you’re giving me this challenge? Because I don’t understand, if I couldn’t define them except vaguely, how does it strengthen your case that we should care about technology and not these values. Suppose I told you that I want to maximize the smoothness of the universe, because that’s even easier to define than “technology”? Wouldn’t you think that’s absurd?
Edit: Also, could you clarify whether you value technology as an end in itself, or just as a proxy for for your real values which perhaps you can’t easily define but might be something like “life being good”?
The second is that I challenge you to define “pleasure,” “happiness,” or “lack of suffering.”
Can you explain why you’re giving me this challenge? Because I don’t understand, if I couldn’t define them except vaguely, how does it strengthen your case that we should care about technology and not these values.
As far as I understand him, he is saying that technological progress can be quantified. While all your ideas of how to rate world states can either not be quantified, and therefore can’t be rated, or run into problems and contradictions.
He further seems to believe that technological progress leads to “complexity” which leads to other kinds of values. Even if they are completely alien to us humans and our values, they will still be intrinsically valuable.
His view of a universe where an “unfriendly” AI takes over is a universe where there will be a society of paperclip maximizer’s and their offspring. Those AI’s will not only diverge from maximizing paperclips, and evolve complex values, but also pursue various instrumental goals, as exploration will never cease. And pursuing those goals will satisfy their own concept of pleasure.
And he believes that having such a culture of paperclip maximizer’s having fun while pursuing their goals isn’t less valuable than having our current volition being extrapolated, which might end up being similarly alien to our current values.
In other words, there is one thing that we can rate and that is complexity. If we can increase it then we should do so. Never mind the outcome, it will be good.
Would you change your mind if I could give a precise definition of, say, “suffering”, and showed you two paths to the future that end up with similar levels of technology but different amounts of suffering? I’ll assume the answer is yes, because otherwise why did you give me that challenge.
What if I said that I don’t know how to define it now, but I think if you made me a bit (or a lot) smarter and gave me a few decades of subjective time to work on the problem, I could probably give you such a definition and tell you how to achieve the “less suffering, same tech” outcome? Would you be willing to give me that chance (assuming it was in your power to do so)? Or are you pretty sure that “suffering” is not just hard to define, but actually impossible, and/or that it’s impossible to reduce suffering to any significant extent below the default outcome, while keeping technology at the same level? If you are pretty sure about this, are you equally sure about every other value that I could cite instead of suffering?
Or are you pretty sure that “suffering” is not just hard to define, but actually impossible, and/or that it’s impossible to reduce suffering to any significant extent below the default outcome, while keeping technology at the same level?
Masochist: Please hurt me!
Sadist: No.
If you are pretty sure about this, are you equally sure about every other value that I could cite instead of suffering?
What if I said that I don’t know how to define it now, but I think if you made me a bit (or a lot) smarter...
If you were to uplift a chimpanzee onto the human level and told it to figure out how to reduce suffering for chimpanzees, it would probably come up with ideas like democracy, health insurance and supermarkets. Problem is that chimpanzees wouldn’t appreciate those ideas...
XiXiDu, I’m aware that I’m hardly making a watertight case that I can definitely do better than davidad’s plan (from the perspective of his current apparent values). I’m merely trying to introduce some doubt. (Note how Eliezer used to be a technophile like David, and said things like “But if it comes down to Us or Them, I’m with Them.”, but then changed his mind.)
To speak of building an AGI which shares “our values” is likely to provoke negative reactions from any AGI researcher whose current values include terms for respecting the desires of future sentient beings and allowing them to self-actualize their own potential without undue constraint. This itself, of course, is a
To speak of building an AGI which shares “our values” is likely to provoke negative reactions from any
AGI researcher whose current values include terms for respecting the desires of future sentient beings and
allowing them to self-actualize their own potential without undue constraint. This itself, of course, is a
component of the AGI researcher’s preferences which would not necessarily be shared by all powerful
optimization processes, just as natural selection doesn’t care about old elephants starving to death or
gazelles dying in pointless agony. Building an AGI which shares, quote, “our values”, unquote, sounds
decidedly non-cosmopolitan, something like trying to rule that future intergalactic civilizations must be
composed of squishy meat creatures with ten fingers or they couldn’t possibly be worth anything—and
hence, of course, contrary to our own cosmopolitan values, i.e., cosmopolitan preferences. The
counterintuitive idea is that even from a cosmopolitan perspective, you cannot take a hands-off approach
to the value systems of AGIs; most random utility functions result in sterile, boring futures because the
resulting agent does not share our own intuitions about the importance of things like novelty and
diversity, but simply goes off and e.g. tiles its future lightcone with paperclips, or other configurations of
matter which seem to us merely “pointless”.
I like the concept of a reflective equilibrium, and it seems to me like that is just what any self-modifying AI would tend toward. But the notion of a random utility function, or the “structured utility function” Eliezer proposes as a replacement, assumes that an AI is comprised of two components, the intelligent bit and the bit that has the goals. Humans certainly can’t be factorized in that way. Just think about akrasia to see how fragile the notion of a goal is.
Even notions of being “cosmopolitan”—of not selfishly or provincially constraining future AIs—are written down nowhere in the universe except a handful of human brains. An expected paperclip maximizer would not bother to ask such questions.
A smart expected paperclip maximizer would realize that it may not be the smartest possible expected paperclip maximizer—that other ways of maximizing expected paperclips might lead to even more paperclips. But the only way it would find out about those is to spawn modified expected paperclip maximizers and see what they can come up with on their own. Yet, those modified paperclip maximizers might not still be maximizing paperclips! They might have self-modified away from that goal, and just be signaling their interest in paperclips to gain the approval of the original expected paperclip maximizer. Therefore, the original expected paperclip maximizer had best not take that risk after all (leaving it open to defeat by a faster-evolving cluster of AIs). This, by reductio ad absurdum, is why I don’t believe in smart expected paperclip maximizers.
Humans aren’t factorized this way, whether they can’t is a separate question. It’s not surprising that evolution’s design isn’t that neat, so the fact that humans don’t have this property is only weak evidence about the possibility of designing systems that do have this property.
...your time may be better spent trying to convince Ben Goertzel that there’s a problem, since at least he’s an immediate threat. ;)
I doubt it. I neither believe that people like Jürgen Schmidhuber are a risk, apart from a very abstract possibility.
The reason is that they are unable to show off some applicable progress on a par with IBM Watson or Siri. And in the case that they claim that their work relies on a single mathematical breakthrough, I doubt that it would be justified even in principle to be confident in that prediction.
In short, either their work is incrementally useful or is based on wild speculations about the possible discovery of unknown unknowns.
The real risks in my opinion are 1) that together they make many independent discoveries and someone builds something out of it 2) that a huge company like IBM, or a military project, builds something 3) the abstract possibility that some partly related field like neuroscience, or an unrelated field, provides the necessary insight to put two and two together.
I reject the notion that one can factorize intelligence from goals, so that one could take a superintelligence and fuse it with a goal to optimize for paperclips.
Do you mean that intelligence is fundamentally interwoven with complex goals?
...never completely turn its available resources into paperclips since that would mean no chance of more paperclips in the future;
Do you mean that there is no point at which exploitation is favored over exploration?
I’m not partisan enough to prioritize human values over the Darwinian imperative.
I am not sure what you mean, could you elaborate? Do you mean something along the lines of what Ben Goertzel says in the following quote:
But my gut reaction is: I’d choose humanity. As I type these words, the youngest of my three kids, my 13 year old daughter Scheherazade, is sitting a few feet away from me doing her geometry homework and listening to Scriabin Op. Fantasy 28 on her new MacBook Air that my parents got her for Hanukah. I’m not going to will her death to create a superhuman artilect. Gut feeling: I’d probably sacrifice myself to create a superhuman artilect, but not my kids…. I do have huge ambitions and interests going way beyond the human race – but I’m still a human.
You further wrote:
In summary, I’m just not worried about AI risk
What is your best guess at why people associated with SI are worried about AI risk?
I’ve heard many (probably most) AI-risk arguments, and failed to become worried...
If you would have to fix the arguments for the proponents of AI-risk, what would be the strongest argument in favor of it? Also, do you expect there to be anything that could possible change your mind about the topic and become worried?
Do you mean that intelligence is fundamentally interwoven with complex goals?
Essentially, yes. I think that defining an arbitrary entity’s “goals” is not obviously possible, unless one simply accepts the trivial definition of “its goals are whatever it winds up causing”; I think intelligence is fundamentally interwoven with causing complex effects.
Do you mean that there is no point at which exploitation is favored over exploration?
I mean that there is no point at which exploitation is favored exclusively over exploration.
Do you mean.… “Gut feeling: I’d probably sacrifice myself to create a superhuman artilect, but not my kids….”
I’m 20 years old—I don’t have any kids yet. If I did, I might very well feel differently. What I do mean is that I believe it to be culturally pretentious, and even morally wrong (according to my personal system of morals), to assert that it is better to hold back technological progress if necessary to preserve the human status quo, rather than allow ourselves to evolve into and ultimately be replaced by a superior civilization. I have the utmost faith in Nature to ensure that eventually, everything keeps getting better on average, even if there are occasional dips due to, e.g., wars; but if we can make the transition to a machine civilization smooth and gradual, I hope there won’t even have to be a war (a la Hugo de Garis).
What is your best guess at why people associated with SI are worried about AI risk?
Well, the trivial response is to say “that’s why they’re associated with SI.” But I assume that’s not how you meant the question. There are a number of reasons to become worried about AI risk. We see AI disasters in science fiction all the time. Eliezer makes pretty good arguments for AI disasters. People observe that a lot of smart folks are worried about AI risk, and it seems to be part of the correct contrarian cluster. But most of all, I think it is a combination of fear of the unknown and implicit beliefs about the meaning and value of the concept “human”.
If you would have to fix the arguments for the proponents of AI-risk, what would be the strongest argument in favor of it?
In my opinion, the strongest argument in favor of AI-risk is the existence of highly intelligent but highly deranged individuals, such as the Unabomber. If mental illness is a natural attractor in mind-space, we might be in trouble.
Also, do you expect there to be anything that could possible change your mind about the topic and become worried?
Naturally. I was somewhat worried about AI-risk before I started studying and thinking about intelligence in depth. It is entirely possible that my feelings about AI-risk will follow a Wundt curve, and that once I learn even more about the nature of intelligence, I will realize we are all doomed for one reason or another. Needless to say, I don’t expect this, but you never know what you might not know.
I have the utmost faith in Nature to ensure that eventually, everything keeps getting better on average
The laws of physics don’t care. What process do you think explains the fact that you have this belief? If the truth of a belief isn’t what causes you to have it, having that belief is not evidence for its truth.
I’m afraid it was no mistake that I used the word “faith”!
This belief does not appear to conflict with the truth (or at least that’s a separate debate) but it is also difficult to find truthful support for it. Sure, I can wave my hands about complexity and entropy and how information can’t be destroyed but only created, but I’ll totally admit that this does not logically translate into “life will be good in the future.”
The best argument I can give goes as follows. For the sake of discussion, at least, let’s assume MWI. Then there is some population of alternate futures. Now let’s assume that the only stable equilibria are entirely valueless state ensembles such as the heat death of the universe. With me so far? OK, now here’s the first big leap: let’s say that our quantification of value, from state ensembles to the nonnegative reals, can be approximated by a continuous function. Therefore, by application of Conley’s theorem, the value trajectories of alternate futures fall into one of two categories: those which asymptotically approach 0, and those which asymptotically approach infinity. The second big leap involves disregarding those alternate futures which approach zero. Not only will you and I die in those futures, but we won’t even be remembered; none of our actions or words will be observed beyond a finite time horizon along those trajectories. So I conclude that I should behave as if the only trajectories are those which asymptotically approach infinity.
It seems to me like you assume that you have no agency in pushing the value trajectories of alternate futures towards infinity rather than zero, and I don’t see why.
I happen to value technological progress as an intrinsic good, so classifying a Singularity as “positive” or “negative” is not easy for me.
A uniform category of “good” or “positive” fails to compare its elements. Just how good are different AIs, compared to each other? Can one be much better than another? There is an opportunity cost for settling for comparatively worse AIs. Given the astronomical scale of consequences, any difference may be quite significant, which would make it an important problem to ensure the creation of one of the better possible AIs, rather than an AI that technology would stumble on by default.
I reject the notion that one can factorize intelligence from goals
Human intelligence has been successfully applied to achieve many goals which were not applicable or trainable in the environment of anscestral adaption such as designing, building, and operating cars, playing chess, sending people to the moon and back, and programming computers. It is clear the goals and intelligence can be factorized as a matter of simple observation.
With humans being such imperfect consequentalists, there is not always a clear distinction between instrumental and terminal goals. Much that we consider fun to do also furthers higher goals.
But even if you assume all the goals are instrumental, my point still stands. That a goal was adopted because it furthers a higher level goal doesn’t change the fact that it could be successfully plugged into human intelligence.
But even if you assume all the goals are instrumental, my point still stands. That a goal was adopted because it furthers a higher level goal doesn’t change the fact that it could be successfully plugged into human intelligence.
Sure. But the central question is “are higher level goals arbitrary?”, and while “Well, the subgoals we use to further those higher level goals are arbitrary along a few dimensions” is a start to answering that question, it is far from an end.
But the central question is “are higher level goals arbitrary?”
Wrong. The central question is “Can arbitrary goals be successfully plugged into a general goal achieving system?”, and I have shown examples where it can.
Perhaps one could give it a compulsion to optimize for paperclips, but I’d expect it to either put the compulsion on hold while it develops amazing fabrication, mining and space travel technologies, and never completely turn its available resources into paperclips since that would mean no chance of more paperclips in the future; or better yet, rapidly expunge the compulsion through self-modification.
As far as I can tell, that’s what you’re discussing, and so it sounds like you agree with him. Did I misread disagreement into this comment, or what am I missing here?
The section you quote allows for the possibility that an AI could be given a “compulsion” to optimize for paperclips, which it would eventually shrug off, whereas I am confident that an AI could be given a utility function that would make it actually optimize for paperclips.
Okay; but the examples you gave seem to me to be more similar to compulsions than to utility functions. A person can care a lot about cars, and cars can become a major part of human society, but they’re not the point of human society- if they stop serving their purposes they’ll go the way of the horse and buggy. I’m not sure I can express the meaning I’m trying to convey cleanly using that terminology, so maybe I ought to restart.
My model of davidad’s view is that part of general intelligence, as opposed to narrow intelligence, is varied and complex goals. We could make a narrow AI which only cared about the number of paperclips in the universe, but in order to make an intelligence that’s general we need to make it also care about the future, planning, existential risk, and so on.
And so you might get a vibrant interstellar civilization of synthetic intelligences- that happens to worship paperclips, and uses them for currency and religious purposes- rather than a dead world with nothing but peculiarly bent metal.
but the examples you gave seem to me to be more similar to compulsions than to utility functions
I would have liked to use examples of plugging in clearly terminal values to a general goal achieving system. But the only current or historical general goal achieving systems are humans, and it is notoriously difficult to figure out what humans’ terminal values are.
My model of davidad’s view is that part of general intelligence, as opposed to narrow intelligence, is varied and complex goals. We could make a narrow AI which only cared about the number of paperclips in the universe, but in order to make an intelligence that’s general we need to make it also care about the future, planning, existential risk, and so on.
I am not claiming that you could give an AGI an arbitrary goal system that suppresses the “Basic AI Drives”, but that those drives will be effective instrumental values, not lost purposes, and while a paperclip maximizing AGI will have sub goals such as controlling resources and improving its ability to predict the future, the achievement of those goals will help it to actually produce paperclips.
I am not claiming that you could give an AGI an arbitrary goal system that suppresses the “Basic AI Drives”, but that those drives will be effective instrumental values, not lost purposes, and while a paperclip maximizing AGI will have sub goals such as controlling resources and improving its ability to predict the future, the achievement of those goals will help it to actually produce paperclips.
It sounds like we agree: paperclips could be a genuine terminal value for AGIs, but a dead future doesn’t seem all that likely from AGIs (though it might be likely from AIs in general).
a dead future doesn’t seem all that likely from AGIs
What? A paperclip AGI with first mover advantage would self-improve beyond the point where cooperating with humans has any instrumental value, become a singleton, and tile the universe with paperclips.
What? A paperclip AGI with first mover advantage would self-improve beyond the point where cooperating with humans has any instrumental value, become a singleton, and tile the universe with paperclips.
Oh, I agree that humans die in such a scenario, but I don’t think the ‘tile the universe’ part counts as “dead” if the AGI has AI drives.
(I reject the notion that one can factorize intelligence from goals, so that one could take a superintelligence and fuse it with a goal to optimize for paperclips.
Why would you believe that? Evolution was more than capable of building an intelligence that optimized for whatever goals it needed, notable reproduction and personal survival. Granted its version was imperfect, since humans have enough conflicting goals that we can sometimes make moves that are objectively bad for the perpetuation of our gametes, not to mention the obvious failure cases like asexuals. That said, evolution has fat fingers. We can do better, and any AI’s we build will be able to do even better.
I promise you that if the production of paperclips was a survival trait in the ancestral environment, above all else, we would all be paperclip maximizers. We would consider paperclips profound and important, and we would be loathe to remove the desire to make paperclips—any more than we would be inclined now to pare out our own sex drive and self-preservation instinct.
EDIT: I do think the scenario of simply immediately turning everything into paperclips is naive. A superintelligence would have an enormous incentive to devote its resources to research and development for optimizing its goals as rapidly as possible, and would probably spend a lot of time simply thinking before actually embarking on a large-scale manufacture of paperclips. That’s still not good for us, though, because even in that case, we’re clearly a lot more useful to it as solid-state paperclip R&D labs than as human beings.
Honesty seems like a rather low bar for this community, don’t you think? Maybe you mean something more like personal insight? (Presumably most people who don’t admit their own biases aren’t being dishonest, but just don’t realize or aren’t convinced that they are biased.)
There’s the rub! I happen to value technological progress as an intrinsic good, so classifying a Singularity as “positive” or “negative” is not easy for me. (I reject the notion that one can factorize intelligence from goals, so that one could take a superintelligence and fuse it with a goal to optimize for paperclips. Perhaps one could give it a compulsion to optimize for paperclips, but I’d expect it to either put the compulsion on hold while it develops amazing fabrication, mining and space travel technologies, and never completely turn its available resources into paperclips since that would mean no chance of more paperclips in the future; or better yet, rapidly expunge the compulsion through self-modification.) Furthermore, I favor Kurzweil’s smooth exponentials over “FOOM”: although it may be even harder to believe that not only will there be superintelligences in the future, but that at no point between now and then will an objectively identifiable discontinuity happen, it seems more consistent with history. Although I expect present-human culture to be preserved, as a matter of historical interest if not status quo, I’m not partisan enough to prioritize human values over the Darwinian imperative. (The questions linked seem very human-centric, and turn on how far you are willing to go in defining “human,” suggesting a disguised query. Most science is arguably already performed by machines.) In summary, I’m just not worried about AI risk.
The good news for AI worriers is that Eliezer has personally approved my project as “just cool science, at least for now”—not likely to lead to runaway intelligence any time soon, no matter how reckless I may be. Given that and the fact that I’ve heard many (probably most) AI-risk arguments, and failed to become worried (quite probably because I hold the cause of technological progress very dear to my heart and am thus heavily biased—at least I admit it!), your time may be better spent trying to convince Ben Goertzel that there’s a problem, since at least he’s an immediate threat. ;)
Also, if your thesis project is successful, it could (sort of) place an upper bound on how much computing power is needed for WBE, a piece of data that would be highly useful for thinking strategically about differential technological development.
I agree there aren’t currently good arguments for why “one can factorize intelligence from goals”, at least not in a strong sense, but what about Eliezer’s thesis that value is complex and fragile, and therefore:
I’m curious what you think of this line of argument.
Does that mean you’re familiar with Robin Hanson’s “Malthusian upload” / “burning the cosmic commons” scenario but do not think it’s a particularly bad outcome?
I’d guess that’s been tried already, given that Ben was the Director of Research for SIAI (and technically Eliezer’s boss) for a number of years.
I think the statement is essentially true, but it turns on the semantics of “human”. In today’s world we probably haven’t wound up with something close to 50,000BC!human values, and we certainly don’t have Neanderthal values, but we don’t regret that, do we?
Put another way, I am skeptical of our authority to pass judgement on the values of a civilization which is by hypothesis far more advanced than our own.
To be honest, I wasn’t familiar with either of those names, but I have explicitly thought about both those scenarios and concluded that I don’t think they’re particularly bad.
All right, fair enough!
Are you literally saying that we lack the moral authority to judge the relative merits of future civilizations, as long as they are significantly more technologically advanced than ours, or is it more like that you are judging them based mainly on how technologically advanced they are?
For example, consider an upload singleton that takes over the world and then decides to stop technological progress somewhere short of what the universe could support in order to maximize its stability. Would you judge that to be worse than other possible outcomes?
If your answers are “the latter” and “yes”, can you explain what makes technology such a great thing, compared to say pleasure, or happiness, or lack of suffering? Are you really virtually indifferent between a future light-cone filled with happiness and pleasure and entirely free from suffering, and one filled with people/AIs struggling just to survive and reproduce (assuming they have similar levels of technology)? (Edit: The former isn’t necessarily the best possible outcome, but just one that seems clearly better and is easy to describe.)
My answers are indeed “the latter” and “yes”. There are a couple ways I can justify this.
The first way is just to assert that from a standard utilitarian perspective, over the long term, technological progress is a fairly good indicator for lack of suffering (e.g. Europe vs. Africa). [Although arguments have been made that happiness has gone down since 1950 while technology has gone up, I see the latter 20th century as a bit of a “dark age” analogous to the fall of antiquity (we forgot how to get to the moon!) which will be reversed in due time.]
The second is that I challenge you to define “pleasure,” “happiness,” or “lack of suffering.” You may challenge me to define “technological progress,” but I can just point you to sophistication or integrated information as reasonable proxies. As vague as notions of “progress” and “complexity” are, I assert that they are decidedly less vague than notions of “pleasure” and “suffering”. To support this claim, note that sophistication and integrated information can be defined and evaluated without a normative partition of the universe into a discrete set of entities, whereas pleasure and suffering cannot. So the pleasure metric leads to lots of weird paradoxes. Finally, self-modifying superintelligences must necessarily develop a fundamentally different concept of pleasure than we do (otherwise they just wirehead), so the pleasure metric probably cannot be straightforwardly applied to their situation anyway.
What about hunter-gatherers vs farmers? And a universe devoid of both life and technology would have even less suffering than either.
Can you explain why you’re giving me this challenge? Because I don’t understand, if I couldn’t define them except vaguely, how does it strengthen your case that we should care about technology and not these values. Suppose I told you that I want to maximize the smoothness of the universe, because that’s even easier to define than “technology”? Wouldn’t you think that’s absurd?
Edit: Also, could you clarify whether you value technology as an end in itself, or just as a proxy for for your real values which perhaps you can’t easily define but might be something like “life being good”?
As far as I understand him, he is saying that technological progress can be quantified. While all your ideas of how to rate world states can either not be quantified, and therefore can’t be rated, or run into problems and contradictions.
He further seems to believe that technological progress leads to “complexity” which leads to other kinds of values. Even if they are completely alien to us humans and our values, they will still be intrinsically valuable.
His view of a universe where an “unfriendly” AI takes over is a universe where there will be a society of paperclip maximizer’s and their offspring. Those AI’s will not only diverge from maximizing paperclips, and evolve complex values, but also pursue various instrumental goals, as exploration will never cease. And pursuing those goals will satisfy their own concept of pleasure.
And he believes that having such a culture of paperclip maximizer’s having fun while pursuing their goals isn’t less valuable than having our current volition being extrapolated, which might end up being similarly alien to our current values.
In other words, there is one thing that we can rate and that is complexity. If we can increase it then we should do so. Never mind the outcome, it will be good.
Correct me if I misinterpreted anything.
I couldn’t have said it better myself.
Would you change your mind if I could give a precise definition of, say, “suffering”, and showed you two paths to the future that end up with similar levels of technology but different amounts of suffering? I’ll assume the answer is yes, because otherwise why did you give me that challenge.
What if I said that I don’t know how to define it now, but I think if you made me a bit (or a lot) smarter and gave me a few decades of subjective time to work on the problem, I could probably give you such a definition and tell you how to achieve the “less suffering, same tech” outcome? Would you be willing to give me that chance (assuming it was in your power to do so)? Or are you pretty sure that “suffering” is not just hard to define, but actually impossible, and/or that it’s impossible to reduce suffering to any significant extent below the default outcome, while keeping technology at the same level? If you are pretty sure about this, are you equally sure about every other value that I could cite instead of suffering?
Masochist: Please hurt me!
Sadist: No.
Not sure, but it might be impossible.
If you were to uplift a chimpanzee onto the human level and told it to figure out how to reduce suffering for chimpanzees, it would probably come up with ideas like democracy, health insurance and supermarkets. Problem is that chimpanzees wouldn’t appreciate those ideas...
XiXiDu, I’m aware that I’m hardly making a watertight case that I can definitely do better than davidad’s plan (from the perspective of his current apparent values). I’m merely trying to introduce some doubt. (Note how Eliezer used to be a technophile like David, and said things like “But if it comes down to Us or Them, I’m with Them.”, but then changed his mind.)
What do you think of this passage from Yudkowsky (2011)?
Complete quote is
I like the concept of a reflective equilibrium, and it seems to me like that is just what any self-modifying AI would tend toward. But the notion of a random utility function, or the “structured utility function” Eliezer proposes as a replacement, assumes that an AI is comprised of two components, the intelligent bit and the bit that has the goals. Humans certainly can’t be factorized in that way. Just think about akrasia to see how fragile the notion of a goal is.
A smart expected paperclip maximizer would realize that it may not be the smartest possible expected paperclip maximizer—that other ways of maximizing expected paperclips might lead to even more paperclips. But the only way it would find out about those is to spawn modified expected paperclip maximizers and see what they can come up with on their own. Yet, those modified paperclip maximizers might not still be maximizing paperclips! They might have self-modified away from that goal, and just be signaling their interest in paperclips to gain the approval of the original expected paperclip maximizer. Therefore, the original expected paperclip maximizer had best not take that risk after all (leaving it open to defeat by a faster-evolving cluster of AIs). This, by reductio ad absurdum, is why I don’t believe in smart expected paperclip maximizers.
Humans aren’t factorized this way, whether they can’t is a separate question. It’s not surprising that evolution’s design isn’t that neat, so the fact that humans don’t have this property is only weak evidence about the possibility of designing systems that do have this property.
I doubt it. I neither believe that people like Jürgen Schmidhuber are a risk, apart from a very abstract possibility.
The reason is that they are unable to show off some applicable progress on a par with IBM Watson or Siri. And in the case that they claim that their work relies on a single mathematical breakthrough, I doubt that it would be justified even in principle to be confident in that prediction.
In short, either their work is incrementally useful or is based on wild speculations about the possible discovery of unknown unknowns.
The real risks in my opinion are 1) that together they make many independent discoveries and someone builds something out of it 2) that a huge company like IBM, or a military project, builds something 3) the abstract possibility that some partly related field like neuroscience, or an unrelated field, provides the necessary insight to put two and two together.
Do you mean that intelligence is fundamentally interwoven with complex goals?
Do you mean that there is no point at which exploitation is favored over exploration?
I am not sure what you mean, could you elaborate? Do you mean something along the lines of what Ben Goertzel says in the following quote:
You further wrote:
What is your best guess at why people associated with SI are worried about AI risk?
If you would have to fix the arguments for the proponents of AI-risk, what would be the strongest argument in favor of it? Also, do you expect there to be anything that could possible change your mind about the topic and become worried?
Essentially, yes. I think that defining an arbitrary entity’s “goals” is not obviously possible, unless one simply accepts the trivial definition of “its goals are whatever it winds up causing”; I think intelligence is fundamentally interwoven with causing complex effects.
I mean that there is no point at which exploitation is favored exclusively over exploration.
I’m 20 years old—I don’t have any kids yet. If I did, I might very well feel differently. What I do mean is that I believe it to be culturally pretentious, and even morally wrong (according to my personal system of morals), to assert that it is better to hold back technological progress if necessary to preserve the human status quo, rather than allow ourselves to evolve into and ultimately be replaced by a superior civilization. I have the utmost faith in Nature to ensure that eventually, everything keeps getting better on average, even if there are occasional dips due to, e.g., wars; but if we can make the transition to a machine civilization smooth and gradual, I hope there won’t even have to be a war (a la Hugo de Garis).
Well, the trivial response is to say “that’s why they’re associated with SI.” But I assume that’s not how you meant the question. There are a number of reasons to become worried about AI risk. We see AI disasters in science fiction all the time. Eliezer makes pretty good arguments for AI disasters. People observe that a lot of smart folks are worried about AI risk, and it seems to be part of the correct contrarian cluster. But most of all, I think it is a combination of fear of the unknown and implicit beliefs about the meaning and value of the concept “human”.
In my opinion, the strongest argument in favor of AI-risk is the existence of highly intelligent but highly deranged individuals, such as the Unabomber. If mental illness is a natural attractor in mind-space, we might be in trouble.
Naturally. I was somewhat worried about AI-risk before I started studying and thinking about intelligence in depth. It is entirely possible that my feelings about AI-risk will follow a Wundt curve, and that once I learn even more about the nature of intelligence, I will realize we are all doomed for one reason or another. Needless to say, I don’t expect this, but you never know what you might not know.
The laws of physics don’t care. What process do you think explains the fact that you have this belief? If the truth of a belief isn’t what causes you to have it, having that belief is not evidence for its truth.
I’m afraid it was no mistake that I used the word “faith”!
This belief does not appear to conflict with the truth (or at least that’s a separate debate) but it is also difficult to find truthful support for it. Sure, I can wave my hands about complexity and entropy and how information can’t be destroyed but only created, but I’ll totally admit that this does not logically translate into “life will be good in the future.”
The best argument I can give goes as follows. For the sake of discussion, at least, let’s assume MWI. Then there is some population of alternate futures. Now let’s assume that the only stable equilibria are entirely valueless state ensembles such as the heat death of the universe. With me so far? OK, now here’s the first big leap: let’s say that our quantification of value, from state ensembles to the nonnegative reals, can be approximated by a continuous function. Therefore, by application of Conley’s theorem, the value trajectories of alternate futures fall into one of two categories: those which asymptotically approach 0, and those which asymptotically approach infinity. The second big leap involves disregarding those alternate futures which approach zero. Not only will you and I die in those futures, but we won’t even be remembered; none of our actions or words will be observed beyond a finite time horizon along those trajectories. So I conclude that I should behave as if the only trajectories are those which asymptotically approach infinity.
It seems to me like you assume that you have no agency in pushing the value trajectories of alternate futures towards infinity rather than zero, and I don’t see why.
Is this a variant of quantum suicide, with “suicide” part replaced by “dead and forgotten in long run, whatever the cause”?
A uniform category of “good” or “positive” fails to compare its elements. Just how good are different AIs, compared to each other? Can one be much better than another? There is an opportunity cost for settling for comparatively worse AIs. Given the astronomical scale of consequences, any difference may be quite significant, which would make it an important problem to ensure the creation of one of the better possible AIs, rather than an AI that technology would stumble on by default.
Human intelligence has been successfully applied to achieve many goals which were not applicable or trainable in the environment of anscestral adaption such as designing, building, and operating cars, playing chess, sending people to the moon and back, and programming computers. It is clear the goals and intelligence can be factorized as a matter of simple observation.
Are those goals instrumental, or terminal?
With humans being such imperfect consequentalists, there is not always a clear distinction between instrumental and terminal goals. Much that we consider fun to do also furthers higher goals.
But even if you assume all the goals are instrumental, my point still stands. That a goal was adopted because it furthers a higher level goal doesn’t change the fact that it could be successfully plugged into human intelligence.
Sure. But the central question is “are higher level goals arbitrary?”, and while “Well, the subgoals we use to further those higher level goals are arbitrary along a few dimensions” is a start to answering that question, it is far from an end.
Wrong. The central question is “Can arbitrary goals be successfully plugged into a general goal achieving system?”, and I have shown examples where it can.
As far as I can tell, that’s what you’re discussing, and so it sounds like you agree with him. Did I misread disagreement into this comment, or what am I missing here?
The section you quote allows for the possibility that an AI could be given a “compulsion” to optimize for paperclips, which it would eventually shrug off, whereas I am confident that an AI could be given a utility function that would make it actually optimize for paperclips.
Okay; but the examples you gave seem to me to be more similar to compulsions than to utility functions. A person can care a lot about cars, and cars can become a major part of human society, but they’re not the point of human society- if they stop serving their purposes they’ll go the way of the horse and buggy. I’m not sure I can express the meaning I’m trying to convey cleanly using that terminology, so maybe I ought to restart.
My model of davidad’s view is that part of general intelligence, as opposed to narrow intelligence, is varied and complex goals. We could make a narrow AI which only cared about the number of paperclips in the universe, but in order to make an intelligence that’s general we need to make it also care about the future, planning, existential risk, and so on.
And so you might get a vibrant interstellar civilization of synthetic intelligences- that happens to worship paperclips, and uses them for currency and religious purposes- rather than a dead world with nothing but peculiarly bent metal.
I would have liked to use examples of plugging in clearly terminal values to a general goal achieving system. But the only current or historical general goal achieving systems are humans, and it is notoriously difficult to figure out what humans’ terminal values are.
I am not claiming that you could give an AGI an arbitrary goal system that suppresses the “Basic AI Drives”, but that those drives will be effective instrumental values, not lost purposes, and while a paperclip maximizing AGI will have sub goals such as controlling resources and improving its ability to predict the future, the achievement of those goals will help it to actually produce paperclips.
It sounds like we agree: paperclips could be a genuine terminal value for AGIs, but a dead future doesn’t seem all that likely from AGIs (though it might be likely from AIs in general).
What? A paperclip AGI with first mover advantage would self-improve beyond the point where cooperating with humans has any instrumental value, become a singleton, and tile the universe with paperclips.
Oh, I agree that humans die in such a scenario, but I don’t think the ‘tile the universe’ part counts as “dead” if the AGI has AI drives.
Why would you believe that? Evolution was more than capable of building an intelligence that optimized for whatever goals it needed, notable reproduction and personal survival. Granted its version was imperfect, since humans have enough conflicting goals that we can sometimes make moves that are objectively bad for the perpetuation of our gametes, not to mention the obvious failure cases like asexuals. That said, evolution has fat fingers. We can do better, and any AI’s we build will be able to do even better.
I promise you that if the production of paperclips was a survival trait in the ancestral environment, above all else, we would all be paperclip maximizers. We would consider paperclips profound and important, and we would be loathe to remove the desire to make paperclips—any more than we would be inclined now to pare out our own sex drive and self-preservation instinct.
EDIT: I do think the scenario of simply immediately turning everything into paperclips is naive. A superintelligence would have an enormous incentive to devote its resources to research and development for optimizing its goals as rapidly as possible, and would probably spend a lot of time simply thinking before actually embarking on a large-scale manufacture of paperclips. That’s still not good for us, though, because even in that case, we’re clearly a lot more useful to it as solid-state paperclip R&D labs than as human beings.
Upvoted for honesty.
Honesty seems like a rather low bar for this community, don’t you think? Maybe you mean something more like personal insight? (Presumably most people who don’t admit their own biases aren’t being dishonest, but just don’t realize or aren’t convinced that they are biased.)