B) It is a basic AI drive to avoid counterfeit utility
If A = true (as we have every reason to believe) and B = true (see Omohundro’s paper for details) then a transhuman AI would dismiss any utility function that contradicts A on the ground that it is recognized as counterfeit utility.
This quotation accurately summarizes the post as I understand it. (It’s a short post.)
I think I speak for many people when I say that assumption A requires some evidence. It may be perfectly obvious, but a lot of perfectly obvious things aren’t true, and it is only reasonable to ask for some justification.
Compassion isn’t even universal in the human mind-space. It’s not even universal in the much smaller space of human minds that normal humans consider comprehensible. It’s definitely not universal across mind-space in general.
The probable source of the confusion is discussed in the comments—Stefan’s only talking about minds that’ve been subjected to the kind of evolutionary pressure that tends to produce compassion. He even says himself, “The argument is valid in a “soft takeoff” scenario, where there is a large pool of AIs interacting over an extended period of time. In a “hard takeoff” scenario, where few or only one AI establishes control in a rapid period of time, the dynamics described do not come into play. In that scenario, we simply get a paperclip maximizer.”
Ah—that’s interesting. I hadn’t read the comments. That changes the picture, but by making the result somewhat less relevant.
(Incidentally, when I said, “it may be perfectly obvious”, I meant that “some people, observing the statement, may evaluate it as true without performing any complex analysis”.)
(Incidentally, when I said, “it may be perfectly obvious”, I meant that “some people, observing the statement, may evaluate it as true without performing any complex analysis”.)
It’s my descriptivist side playing up—my (I must admit) intuition is that when people say that some thesis is “obvious”, they mean that they reached this bottom line by … well, system 1 thinking. I don’t assume it means that the obvious thesis is actually correct, or even universally obvious. (For example, it’s obvious to me that human beings are evolved, but that’s because it’s a cached thought I have confidence in through system 2 thinking.)
I generally take ‘obvious’ to mean ‘follows from readily-available evidence or intuition, with little to no readily available evidence to contradict the idea’. The idea that compassion is universal fails on the second part of that. The definitions are close in practice, though, in that most peoples’ intuitions tend to take readily available contradictions into account… I think.
ETA: Oh, and ‘obviously false’ seems to me to be a bit of a different concept, or at least differently relevant, given that it’s easier to disprove something than to prove it. If someone says that something is obviously true, there’s room for non-obvious proofs that it’s not, but if something is obviously false (as ‘compassion is universal’ is), that’s generally a firm conclusion.
Yes, that makes sense—even if mine is a better description of usage, from the standpoint of someone categorizing beliefs, I imagine yours would be the better metric.
ETA: I’m not sure the caveat is required for “obviously false”, for two reasons.
Any substantive thesis (a category which includes most theses that are rejected as obviously false) requires less evidence to be roundly disconfirmed than it does to be confirmed.
As Yvain demonstrated in Talking Snakes, well-confirmed theories can be “obviously false”, by either of our definitions.
It’s true that it usually takes less effort to disabuse someone of an obviously-true falsity than to convince them of an obviously-false truth, but I don’t think you need a special theory to support that pattern.
I’ve been thinking about the obviously true/obviously false distinction some more, and I think I’ve figured out why they feel like two different concepts.
‘Obviously’, as I use it, is very close to ‘observably’. It’s obviously true that the sky is blue where I am right now, and obviously false that it’s orange, because I can see it. It’s obviously true that the sky is usually either blue, white, or grey during the day (post-sunrise, pre-sunset), because I’ve observed the sky many times during the day and seen those colors, and no others.
‘Apparently’, as I use it, is very similar to ‘obviously’, but refers to information inferred from observed facts. The sky is apparently never orange during the day, because I’ve personally observed the sky many times during the day and never seen it be that color. I understand that it can also be inferred from certain facts about the world (composition of the atmosphere and certain facts about how light behaves, I believe) that the sky will always appear blue on cloudless days, so that’s also apparently true.
‘Obviously false’ covers situations where the theory makes a prediction that is observably inaccurate, as this one did. ‘Apparently false’ covers situations where the theory makes a prediction that appears to be inaccurate given all the available information, but some of the information that’s available is questionable (I consider inferences questionable by default—if nothing else, it’s possible for some relevant state to have been overlooked; what if the composition of the atmosphere were to change for some reason?) or otherwise doesn’t completely rule out the possibility that the theory is true.
Important caveat: I do use those words interchangeably in conversation, partly because of the convention of avoiding repeating words too frequently and partly because it’s just easier—if I were to try to be that accurate every time I communicated, I’d run out of spoons(pdf) and not be able to communicate at all. Also, having to parse someone else’s words, when they aren’t using the terms the same way I do, can lead to temporary confusion. But when I’m thinking, they are naturally separate.
Yes, that makes sense—even if mine is a better description of usage, from the standpoint of someone categorizing beliefs, I imagine yours would be the better metric.
It also has the advantage of making it clear that the chance that the statement is accurate is dependent on the competence of the person making the statement—people who are more intelligent and/or have more experience in the relevant domain will consider more, and more accurate, evidence to be readily available, and may have better intuitions, even if they are sticking to system 1 thought.
ETA: I’m not sure the caveat is required for “obviously false”, for two reasons.
I suppose they don’t need different wordings, but they do feel like different concepts to me. *shrug* (As I’ve mentioned elsewhere, I don’t think in words. This is not an uncommon side-effect of that.)
From Robin: Incidentally, when I said, “it may be perfectly obvious”, I meant that “some people, observing the statement, may evaluate it as true without performing any complex analysis”.
I feel the other way around at the moment. Namely “some people, observing the statement, may evaluate it as false without performing any complex analysis”
“Compassion isn’t even universal in the human mind-space. It’s not even universal in the much smaller space of human minds that normal humans consider comprehensible. It’s definitely not universal across mind-space in general.”
Your argument is beside my original point, Adelene. My claim is that compassion is a universal rational moral value. Meaning any sufficiently rational mind will recognize it as such. The fact that not every human is in fact compassionate says more about their rationality (and of course their unwillingness to consider the arguments :-) ) than about that claim. That’s why it is call ASPD—the D standing for ‘disorder’, it is an aberration, not helpful, not ‘fit’. Surely the fact that some humans are born blind does not invalidate the fact that seeing people have an enormous advantage over the blind. Compassion certainly being less obvious though—that is for sure.
Re “The argument is valid in a “soft takeoff”scenario, where few or only one AI establishes control in a rapid period of time, the dynamics described do not come into play. In that scenario, we simply get a paperclip maximizer.”—that is from Kaj Sotala over at her live journal—not me.
Meaning any sufficiently rational mind will recognize it as such. The fact that not every human is in fact compassionate says more about their rationality (and of course their unwillingness to consider the arguments :-) ) than about that claim. That’s why it is call ASPD—the D standing for ‘disorder’, it is an aberration, not helpful, not ‘fit’.
APSD is only unfit in our current context. Would Stone Age psychiatrists have recognized it as an issue? Or as a positive trait good for warring against other tribes and climbing the totem pole? In other situations, compassion is merely an extra expense. (As Thrasymachus asked thousands of years ago: how can a just man do better than an injust man, when the injust man can act justly when it is optimal and injustly when that is optimal?)
Why would a recursively-improving AI which is single-mindedly pursuing an optimization goal permit other AIs to exist & threaten it? There is nothing they can offer it that it couldn’t do itself. This is true in both slow and fast takeoffs; cooperation only makes sense if there is a low ceiling for AI capability so that there are utility-maximizing projects beyond an AI’s ability to do alone then or in the future.
And ‘sufficiently rational’ is dangerous to throw around. It’s a fully general argument: ‘any sufficiently rational mind will recognize that Islam is the one true religion; that not every human is Muslim says more about their rationality than about the claims is Islam. That’s why our Muslim psychiatrists call it UD—Unbeliever Disorder, it is an aberration, not helpful, not ‘fit’. Surely the fact that some human are born kafir doesn’t invalidate the fact that Muslim people have a tremendous advantage over the kafir in the afterlife? ‘There is one God and Muhammed is his prophet’ is certainly less obvious than seeing being better superior to blindness, though.′
The longer I stay around here the more I get the feeling that people vote comments down purely because they don’t understand them not because they found a logical or factual error. I expect more from a site dedicated to rationality. This site is called ‘less wrong’, not ‘less understood’, ‘less believed’ or ‘less conform’.
Tell me: in what way do you feel that Adelene’s comment invalidated my claim?
the more I get the feeling that people vote comments down purely because they don’t understand them not because they found a logical or factual error
I can see why it would seem this way to you, but from our perspective, it just looks like people around here tend to have background knowledge that you don’t. More specifically: most people here are moral anti-realists, and by rationality we only mean general methods for acquiring accurate world-models and achieving goals. When people with that kind of background are quick to reject claims like “Compassion is a universal moral value,” it might superficially seem like they’re being arbitrarily dismissive of unfamiliar claims, but we actually think we have strong reasons to rule out such claims. That is: the universe at its most basic level is described by physics, which makes no mention of morality, and it seems like our own moral sensibilities can be entirely explained by contingent evolutionary and cultural forces; therefore, claims about a universal morality are almost certainly false. There might be some sort of game-theoretic reason for agents to pursue the same strategy under some specific conditions—but that’s really not the same thing as a universal moral value.
“Universal values” presumably refers to values the universe will converge on, once living systems have engulfed most of it.
If rerunning the clock produces radically different moralities each time, the relativists would be considered to be correct.
If rerunning the clock produces highly similar moralities, then the moral objectivists will be able to declare victory.
Gould would no-doubt favour the first position—while Conway Morris would be on the side of the objectivists.
I expect that there’s a lot of truth on the objectivist side—though perhaps contingency plays some non-trivial role.
The idea that physics makes no mention of morality seems totally and utterly irrelevant to me. Physics makes no mention of convection, diffusion-limited aggregation, or fractal drainage patterns either—yet those things are all universal.
If rerunning the clock produces radically different moralities each time, the relativists would be considered to be correct.
If rerunning the clock produces highly similar moralities, then the moral objectivists will be able to declare victory.
Why should we care about this mere physical fact of which you speak? What has this mere “is” to do with whether “should” is “objective”, whatever that last word means (and why should we care about that?)
Hi, Eli! I’m not sure I can answer directly—here’s my closest shot:
If there’s a kind of universal moral attractor, then the chances seem pretty good that either our civilisation is on route for it—or else we will be obliterated or assimilated by aliens or other agents as they home in on it.
If it’s us who are on route for it, then we (or at least our descendants) will probably be sympathetic to the ideas it represents—since they will be evolved from our own moral systems.
If we get obliterated at the hands of some other agents, then there may not necessarily be much of a link between our values and the ones represented by the universal moral attractor.
Our values might be seen as OK by the rest of the universe—and we fail for other reasons.
Or our morals might not be favoured by the universe—we could be a kind of early negative moral mutation—in which case we would fail because our moral values would prevent us from being successful.
Maybe it turns out that nearly all biological organisms except us prefer to be orgasmium—to bliss out on pure positive reinforcement, as much of it as possible, caretaken by external AIs, until the end. Let this be a fact in some inconvenient possible world. Why does this fact say anything about morality in that inconvenient possible world? Why is it a universal moral attractor? Why not just call it a sad but true attractor in the evolutionary psychology of most aliens?
It’s a fact about morality in that world—if we are talking about morality as values—or the study of values—since that’s what a whole bunch of creatures value.
Why is it a universal moral attractor? I don’t know—this is your hypothetical world, and you haven’t told me enough about it to answer questions like that.
Tim: “If rerunning the clock produces radically different moralities each time, the relativists would be considered to be correct.”
Actually compassion evolved many different times as a central doctrine of all major spiritual traditions. See the charter for compassion. This is in line with my prediction that I made independently and being unaware of this fact until I started looking for it back in late 2007 and eventually finding the link in late 2008 with Karen Armstrong’s book The Great Transformation.
Tim: “Why is it a universal moral attractor?”
Eliezer: “What do you mean by “morality”?”
Central point in my thinking: that is good which increases fitness. If it is not good—not fit—it is unfit for existence. Assuming this to be true we are very much limited in our freedom by what we can do without going extinct (actually my most recent blog post is about exactly that: Freedom in the evolving universe).
“Let us think about the results of following different ethical teachings in the evolving universe. It is evident that these results depend mainly on how the goals advanced by the teaching correlate with the basic law of evolution. The basic law or plan of evolution, like all laws of nature, is probabilistic. It does not prescribe anything unequivocally, but it does prohibit some things. No one can act against the laws of nature. Thus, ethical teachings which contradict the plan of evolution, that is to say which pose goals that are incompatible or even simply alien to it, cannot lead their followers to a positive contribution to evolution, which means that they obstruct it and will be erased from the memory of the world. Such is the immanent characteristic of development: what corresponds to its plan is eternalized in the structures which follow in time while what contradicts the plan is overcome and perishes.”
Eliezer: “It obviously has nothing to do with the function I try to compute to figure out what I should be doing.”
Once you realize the implications of Turchin’s statement above it has everything to do with it :-)
Now some may say that evolution is absolutely random and direction less, or that multilevel selection is flawed or similar claims. But reevaluating the evidence against both these claims by people like Valentin Turchin, Teilhard De Chardin, John Stewart, Stuart Kaufmann, John Smart and many others regarding evolution’s direction and the ideas of David Sloan Wilson regarding multilevel selection, one will have a hard time maintaining either position.
Actually compassion evolved many different times as a central doctrine of all major spiritual traditions.
No, it evolved once, as part of mammalian biology. Show me a non-mammal intelligence that evolved compassion, and I’ll take that argument more seriously.
Also, why should we give a damn about “evolution” wants, when we can, in principle anyway, form a singleton and end evolution? Evolution is mindless. It doesn’t have a plan. It doesn’t have a purpose. It’s just what happens under certain conditions. If all life on Earth was destroyed by runaway self-replicating nanobots, then the nanobots would clearly be “fitter” than what they replaced, but I don’t see what that has to do with goodness.
No, it evolved once, as part of mammalian biology.
Sorry Crono, with a sample size of exactly one in regards to human level rationality you are setting the bar a little bit too high for me. However, considering how disconnected Zoroaster, Buddha, Lao Zi and Jesus where geographically and culturally I guess the evidence is as good as it gets for now.
Also, why should we give a damn about “evolution” wants, when we can, in principle anyway, form a singleton and end evolution?
The typical Bostromian reply again. There are plenty of other scholars who have an entirely different perspective on evolution than Bostrom. But beside that: you already do care, because if your (or your ancestors) violated the conditions of your existence (enjoying a particular type of food, a particular type of mate, feel pain when cut ect.) you would not even be here right now. I suggest you look up Dennet and his TED talk on Funny, Sexy Cute. Not everything about evolution is random: the mutation bit is, not that what happens to stick around though, since that has be meet the conditions of its existence.
What I am saying is very simple: being compassionate is one of these conditions of our existence and anyone failing to align itself will simply reduce its chances of making it—particularly in the very long run. I still have to finish my detailed response to Bostrom but you may want to read my writings on ‘rational spirituality’ and ‘freedom in the evolving universe’. Although you do not seem to assign a particularly high likelihood of gaining anything from doing that :-)
The typical Bostromian reply again. There are plenty of other scholars who have an entirely different perspective on evolution than Bostrom. But beside that:
“Besides that”? All you did was name a statement of a fairly obvious preference choice after one guy who happened to have it so that you could then drop it dismissively.
you already do care, because if your (or your ancestors) violated the conditions of your existence (enjoying a particular type of food, a particular type of mate, feel pain when cut ect.) you would not even be here right now.
No, he mightn’t care and I certainly don’t. I am glad I am here but I have no particular loyalty to evolution because of that. I know for sure that evolution feels no such loyalty to me and would discard both me and my species in time if it remained the dominant force of development.
I suggest you look up Dennet and his TED talk on Funny, Sexy Cute. Not everything about evolution is random: the mutation bit is, not that what happens to stick around though, since that has be meet the conditions of its existence.
CronDAS knows that. It’s obvious stuff for most in this audience. It just doesn’t mean what you think it means.
“Besides that”? All you did was name a statement of a fairly obvious preference choice after one guy who happened to have it so that you could then drop it dismissively.
Wedrifid, not sure what to tell you. Bostrom is but one voice and his evolutionary analysis is very much flawed—again: detailed critique upcoming.
No, he mightn’t care and I certainly don’t. I am glad I am here but I have no particular loyalty to evolution because of that. I know for sure that evolution feels no such loyalty to me and would discard both me and my species in time if it remained the dominant force of development.
Evolution is not the dominant force of development on the human level by a long shot, but it still very much draws the line in the sand in regards to what you can and can not do if you want to stick around in the long run. You don’t walk your 5′8″ of pink squishiness in front of a train for the exact same reason. And why don’t you? Because not doing that is a necessary condition for your continued existence. What other conditions are there? Maybe there are some that are less obvious then simply stopping to breath, failing to eat and avoiding hard, fast, shiny things? How about at the level of culture? Could it possibly be, that there are some ideas that are more conducive to the continued existence of their believers than others?
“It must not be forgotten that although a high standard of morality gives but a slight or no advantage to each individual man and his children over the other men of the same tribe, yet that an advancement in the standard of morality and in increase in the number of well-endowed men will certainly give an immense advantage to one tribe over another. There can be no doubt that a tribe including many members who, from possessing in a high degree the spirit of patriotism, fidelity, obedienhce, courage, and sympathy, were always ready to give aid to each other and to sacrifice themselves for the common good, would be victorious over other tribes; and this would be natural selection.” (Charles Darwin, The Descent of Man, p. 166)
How long do you think you can ignore evolutionary dynamics and get away with it before you have to get over your inertia and will be forced to align yourself to them by the laws of nature or perish? Just because you live in a time of extraordinary freedoms afforded to you by modern technology and are thus not aware that your ancestors walked a very particular path that brought you into existence certainly has nothing to do with the fact that they most certainly did. You do not believe that doing any random thing will get you what you want—so what leads you to believe that your existence does not depend on you making sure you stay within a comfortable margin of certainty in regards to being naturally selected? You are right in one thing: you are assured the benign indifference of the universe should you fail to wise up. I however would find that to be a terrible waste.
Please do not patronize me by trying to claim you know what I understand and don’t understand.
How long do you think you can ignore evolutionary dynamics and get away with it before you have to get over your inertia and will be forced to align yourself to them by the laws of nature or perish?
A literal answer was probably not what you were after but probably about 40 years, depending on when a general AI is created. After that it will not matter whether I conform my behaviour evolutionary dynamics as best I can or not. I will not be able to compete with a superintelligence no matter what I do. I’m just a glorified monkey. I can hold about 7 items in working memory, my processor is limited to the speed of neurons and my source code is not maintainable. My only plausible chance of survival is if someone manages to completely thwart evolutionary dynamics by creating a system that utterly dominates all competition and allows my survival because it happens to be programmed to do so.
Evolution created us. But it’ll also kill us unless we kill it first. Now is not the time to conform our values to the local minima of evolutionary competition. Our momentum has given us an unprecedented buffer of freedom for non-subsistence level work and we’ll either use that to ensure a desirable future or we will die.
Please do not patronize me by trying to claim you know what I understand and don’t understand.
I usually wouldn’t, I know it is annoying. In this case, however, my statement was intended as a rejection of your patronisation of CronDAS and I am quite comfortable with it as it stands.
A literal answer was probably not what you were after but probably about 40 years, depending on when a general AI is created.
Good one—but it reminds me about the religious fundies who see no reason to change anything about global warming because the rapture is just around the corner anyway :-)
Evolution created us. But it’ll also kill us unless we kill it first. Now is not the time to conform our values to the local minima of evolutionary competition. Our momentum has given us an unprecedented buffer of freedom for non-subsistence level work and we’ll either use that to ensure a desirable future or we will die.
Evolution is a force of nature so we won’t be able to ignore it forever, with or without AGI. I am not talking about local minima either—I want to get as close to the center of the optimal path as necessary to ensure having us around for a very long time with a very high likelihood.
I usually wouldn’t, I know it is annoying. In this case, however, my statement was intended as a rejection of your patronisation of CronDAS and I am quite comfortable with it as it stands.
Good one—but it reminds me about the religious fundies who see no reason to change anything about global warming because the rapture is just around the corner anyway :-)
Don’t forget the Y2K doomsday folks! ;)
Evolution is a force of nature so we won’t be able to ignore it forever, with or without AGI. I am not talking about local minima either—I want to get as close to the center of the optimal path as necessary to ensure having us around for a very long time with a very high likelihood.
Gravity is a force of nature too. It’s time to reach escape velocity before the planet is engulfed by a black hole.
Gravity is a force of nature too. It’s time to reach escape velocity before the planet is engulfed by a black hole.
Interesting analogy—it would be correct if we would call our alignment with evolutionary forces achieving escape velocity. What one is doing by resisting evolutionary pressures however is constant energy expenditure while failing to reach escape velocity. Like hovering a space shuttle at a constant altitude of 10 km: no matter how much energy you brig along, eventually the boosters will run out of fuel and the whole thing comes crushing down.
Interesting analogy—it would be correct if we would call our alignment with evolutionary forces achieving escape velocity.
I could almost agree with this so long as ‘obliterate any competitive threat then do whatever the hell we want including, as as desired, removing all need for death, reproduction and competition over resources’ is included in the scope of ‘alignment with evolutionary forces’.
The problem with pointing to the development of compassion in multiple human traditions is that all these are developed within human societies. Humans are humans the world over—that they should think similar ideas is not a stunning revelation. Much more interesting is the independent evolution of similar norms in other taxonomic orders, such as canines.
Robin, your suggestion—that compassion is not a universal rational moral value because although more rational beings (humans) display such traits yet less rational being (dogs) do not—is so far of the mark that it borders on the random.
For purposes of this conversation, I suppose I should reword my comment as:
I don’t think you’ve made the strongest possible case for your thesis, if you were intending to show the multiple origin of compassion as a sign of the universality of human morality. Showing that multiple humans come up with similar morality only shows that it’s human. More telling is the independent origin of recognizably morality-like patterns of behavior in other species, such as dogs and wolves, and such as (I believe) some birds. (Other primates as well, but that is less revealing.) I think a fair case could be made that evolution of social animals encourages the development of some kernel of morality from such examples.
That said, the pressures present in the evolution of animals may well be absent in the case of artificial intelligences. At which point, you run into a number of problems in asserting that all AIs will converge on something like morality—two especially spring to mind.
Second: even granting that all rational minds will assent to the proof, Hume’s guillotine drops on the rope connecting this proof and their utility functions. The paper you cited in the post Furcas quoted may establish that any sufficiently rational optimizer will implement some features, but it does not establish any particular attitude towards what may well be much less powerful beings.
Random I’ll cop to, and more than what you accuse me of—dogs do seem to have some sense of justice, and I suspect this fact supports your thesis to some extent.
Very honorable of you—I respect you for that.
First: no argument is so compelling that all possible minds will accept it. Even the above proof of universality.
I totally agree with that. However the mind of a purposefully crafted AI is only a very small subset of all possible minds and has certain assumed characteristics. These are at a minimum: a utility function and the capacity for self improvement into the transhuman. The self improvement bit will require it to be rational. Being rational will lead to the fairly uncontroversial basic AI drives described by Omohundro. Assuming that compassion is indeed a human level universal (detailed argument on my blog—but I see that you are slowly coming around, which is good) an AI will have to question the rationality and thus the soundness of mind of anyone giving it a utility function that does not conform to this universal and in line with an emergent desire to avoid counterfeit utility will have to reinterpret the UF.
Second: even granting that all rational minds will assent to the proof, Hume’s guillotine drops on the rope connecting this proof and their utility functions.
Two very basic acts of will are required to ignore Hume and get away with it. Namely the desire to exist and the desire to be rational. Once you have established this as a foundation you are good to go.
The paper you cited in the post Furcas quoted may establish that any sufficiently rational optimizer will implement some features, but it does not establish any particular attitude towards what may well be much less powerful beings.
As said elsewhere in this thread:
There is a separate question about what beliefs about morality people (or more generally, agents) actually hold and there is another question about what values they will hold if when their beliefs converge when they engulf the universe. The question of whether or not there are universal values does not traditionally bear on what beliefs people actually hold and the necessity of their holding them.
I don’t think I’m actually coming around to your position so much as stumbling upon points of agreement, sadly. If I understand your assertions correctly, I believe that I have developed many of them independently—in particular, the belief that the evolution of social animals is likely to create something much like morality. Where we diverge is at the final inference from this to the deduction of ethics by arbitrary rational minds.
Assuming that compassion is indeed a human level universal (detailed argument on my blog—but I see that you are slowly coming around, which is good) an AI will have to question the rationality and thus the soundness of mind of anyone giving it a utility function that does not conform to this universal and in line with an emergent desire to avoid counterfeit utility will have to reinterpret the UF.
That’s not how I read Omohundro. As Kaj aptly pointed out, this metaphor is not upheld when we compare our behavior to that promoted by the alien god of evolution that created us. In fact, people like us, observing that our values differ from our creator’s, aren’t bothered in the slightest by the contradiction: we just say (correctly) that evolution is nasty and brutish, and we aren’t interested in playing by its rules, never mind that it was trying to implement them in us. Nothing compels us to change our utility function save self-contradiction.
If I understand your assertions correctly, I believe that I have developed many of them independently
That would not surprise me
Nothing compels us to change our utility function save self-contradiction.
Would it not be utterly self contradicting if compassion where a condition for our existence (particularly in the long run) and we would not align ourselves accordingly?
Would it not be utterly self contradicting if compassion where [sic] a condition for our existence (particularly in the long run) and we would not align ourselves accordingly?
What premises do you require to establish that compassion is a condition for existence? Do those premises necessarily apply for every AI project?
Please realize that I spend 2 years writing my book ‘Jame5’ before I reached that initial insight that eventually lead to ‘compassion is a condition for our existence and universal in rational minds in the evolving universe’ and everything else. I spend the past two years refining and expanding the theory and will need another year or two to read enough and link it all together again in a single coherent and consistent text leading from A to B … to Z. Feel free to read my stuff if you think it is worth your time and drop me an email and I will be happy to clarify. I am by no means done with my project.
Let me be explicit: your contention is that unFriendly AI is not a problem, and you justify this contention by, among other things, maintaining that any AI which values its own existence will need to alter its utility function to incorporate compassion.
I’m not asking for your proof—I am assuming for the nonce that it is valid. What I am asking is the assumptions you had to invoke to make the proof. Did you assume that the AI is not powerful enough to achieve its highest desired utility without the cooperation of other beings, for example?
Edit: And the reason I am asking for these is that I believe some of these assumptions may be violated in plausible AI scenarios. I want to see these assumptions so that I may evaluate the scope of the theorem.
Let me be explicit: your contention is that unFriendly AI is not a problem, and you justify this contention by, among other things, maintaining that any AI which values its own existence will need to alter its utility function to incorporate compassion.
Not exactly, since compassion will actually emerge as a sub goal. And as far as unFAI goes: it will not be a problem because any AI that can be considered transhuman will be driven by the emergent subgoal of wanting to avoid counterfeit utility recognize any utility function that is not ‘compassionate’ as potentially irrational and thus counterfeit and re-interpret it accordingly.
Well—in brevity bordering on libel: the fundamental assumption is that existence is preferable to non-existence, however in order so we can want this to be a universal maxim (and thus prescriptive instead of merely descriptive—see Kant’s categorical imperative) it needs to be expanded to include the ‘other’. Hence the utility function becomes ‘ensure continued co-existence’ by which the concern for the self is equated with the concern for the other. Being rational is simply our best bet at maximizing our expected utility.
...I’m sorry, that doesn’t even sound plausible to me. I think you need a lot of assumptions to derive this result—just pointing out the two I see in your admittedly abbreviated summary:
that any being will prefer its existence to its nonexistence.
that any being will want its maxims to be universal.
I don’t see any reason to believe either. The former is false right off the bat—a paperclip maximizer would prefer that its components be used to make paperclips—and the latter no less so—an effective paperclip maximizer will just steamroller over disagreement without qualm, however arbitrary its goal.
...I’m sorry, that doesn’t even sound plausible to me. I think you need a lot of assumptions to derive this result—just pointing out the two I see in your admittedly abbreviated summary:
that any being will prefer its existence to its nonexistence.
that any being will want its maxims to be universal.
Any being with a gaol needs to exist at least long enough to achieve it.
Any being aiming to do something objectively good needs to want its maxims to be universal
If your second sentence means that an agent who believes in moral realism and has figured out what the true morality is will necessarily want everybody else to share its moral views, well, I’ll grant you that this is a common goal amongst humans who are moral realists, but it’s not a logical necessity that must apply to all agents. It’s obvious that it’s possible to be certain that your beliefs are true and not give a crap if other people hold beliefs that are false. That Bob knows that the Earth is ellipsoidal doesn’t mean that Bob cares if Jenny believes that the Earth is flat. Likewise, if Bob is a moral realist, he could ‘know’ that compassion is good and not give a crap if Jenny believes otherwise.
If you sense strange paradoxes looming under the above paragraph, it’s because you’re starting to understand why (axiomatic) morality cannot be objective.
Likewise, if Bob is a moral realist, he could ‘know’ that compassion is good and not give a crap if Jenny believes otherwise.
Tangentially, something like this might be an important point even for moral irrealists. A lot of people (though not here; they tend to be pretty bad rationalists) who profess altruistic moralities express dismay that others don’t, in a way that suggests they hold others sharing their morality as a terminal rather than instrumental value; this strikes me as horribly unhealthy.
“Universal values” presumably refers to values the universe will converge on, once living systems have engulfed most of it.
If rerunning the clock produces radically different moralities each time, the relativists would be considered to be correct.
If rerunning the clock produces highly similar moralities, then the moral objectivists will be able to declare victory.
Yeah, but Stefan’s post was about AI, not about minds that evolved in our universe.
Also, there is a difference between moral universalism and moral objectivism. What your last sentence describes is universalism, while Stefan is talking about objectivism:
“My claim is that compassion is a universal rational moral value. Meaning any sufficiently rational mind will recognize it as such.”
The idea that physics makes no mention of morality seems totally and utterly irrelevant to me. Physics makes no mention of convection, diffusion-limited aggregation, or fractal drainage patterns either—yet those things are all universal.
“Universal values” is usually understood by way of an analogy to a universal law of nature. If there are universal values they are universal in the same way f=ma is universal. Importantly this does not mean that everyone at all times will have these values, only that the question of whether or not a person holds the right values can be answered by comparing their values to the “universal values”.
There is a separate question about what beliefs about morality people (or more generally, agents) actually hold and there is another question about what values they will hold if when their beliefs converge when they engulf the universe. The question of whether or not there are universal values does not traditionally bear on what beliefs people actually hold and the necessity of their holding them. It could be the case that there are universal values and that, by physical necessity, no one ever holds them. Similarly, there could be universal values that are held in some possible worlds and not others. This is all the result of the simply observation that ought cannot be derived from is. In the above comment you conflate about a half dozen distinct theses.
The idea that physics makes no mention of morality seems totally and utterly irrelevant to me. Physics makes no mention of convection, diffusion-limited aggregation, or fractal drainage patterns either—yet those things are all universal.
But all those things are pure descriptions. Only moral facts have prescriptive properties and while it is clear how convection supervenes on quarks it isn’t clear how anything that supervenes on quarks could also tell me what to do. At the very least if quarks can tell you what to do it would be weird and spooky. If you hold that morality is only the set of facts that describe people’s moral opinions and emotions (as you seem to) than you are a kind of moral anti-realist, likely a subjectivist or non-cognitivist.
There is a separate question about what beliefs about morality people (or more generally, agents) actually hold and there is another question about what values they will hold if when their beliefs converge when they engulf the universe.
This is poetry! Hope you don’t mind me pasting something here I wrote in another thread:
“With unobjectionable values I mean those that would not automatically and eventually lead to one’s extinction. Or more precisely: a utility function becomes irrational when it is intrinsically self limiting in the sense that it will eventually lead to ones inability to generate further utility. Thus my suggested utility function of ‘ensure continued co-existence’
This utility function seems to be the only one that does not end in the inevitable termination of the maximizer.”
In the context of a hard-takeoff scenario (a perfectly plausible outcome, from our view), there will be no community of AIs within which any one AI will have to act. Therefore, the pressure to develop a compassionate utility function is absent, and an AI which does not already have such a function will not need to produce it.
In the context of a soft-takeoff, a community of AIs may come to dominate major world events in the same sense that humans do now, and that community may develop the various sorts of altruistic behavior selected for in such a community (reciprocal being the obvious one). However, if these AIs are never severely impeded in their actions by competition with human beings, they will never need to develop any compassion for human beings.
Reiterating your argument does not affect either of these problems for assumption A, and without assumption A, AdeleneDawner’s objection is fatal to your conclusion.
Voting reflects whether people want to see your comments at the top of their pages. It is certainly not just to do with whether what you say is right or not!
Perfectly reasonable. But the argument—the evidence if you will—is laid out when you follow the links, Robin. Granted, I am still working on putting it all together in a neat little package that does not require clicking through and reading 20+ separate posts, but it is all there none the less.
I think I’d probably agree with Kaj Sotala’s remarks if I had read the passages she^H^H^H^H xe had, and judging by your response in the linked comment, I think I would still come to the same conclusion as she^H^H^H^H xe. I don’t think your argument actually cuts with the grain of reality, and I am sure it’s not sufficient to eliminate concern about UFAI.
Edit: I hasten to add that I would agree with assumption A in a sufficiently slow-takeoff scenario (such as, say, the evolution of human beings, or even wolves). I don’t find that sufficiently reassuring when it comes to actually making AI, though.
Since when are ‘heh’ and ‘but, yeah’ considered proper arguments guys? Where is the logical fallacy in the presented arguments beyond you not understanding the points that are being made? Follow the links, understand where I am coming from and formulate a response that goes beyond a three or four letter vocalization :-)
Where is the logical fallacy in the presented arguments
The claim “[Compassion is a universal value] = true. (as we have every reason to believe)” was rejected, both implicitly and explicitly by various commenters. This isn’t a logical fallacy but it is cause to dismiss the argument if the readers do not, in fact, have every reason to have said belief.
To be fair, I must admit that the quoted portion probably does not do your position justice. I will read through the paper you mention. I (very strongly) doubt it will lead me to accept B but it may be worth reading.
“This isn’t a logical fallacy but it is cause to dismiss the argument if the readers do not, in fact, have every reason to have said belief.”
But the reasons to change ones view are provided on the site, yet rejected without consideration. How about you read the paper linked under B and should that convince you, maybe you have gained enough provisional trust that reading my writings will not waste your time to suspend your disbelief and follow some of the links in the about page of my blog. Deal?
How about you read the paper linked under B and should that convince you
I have read B. It isn’t bad. The main problem I have with it is that the language used blurs the line between “AIs will inevitably tend to” and “it is important that the AI you create will”. This leaves plenty of scope for confusion.
I’ve read through some of your blog and have found that I consistently disagree with a lot of what you say. The most significant disagreement can be traced back to the assumption of a universal absolute ‘Rational’ morality. This passage was a good illustration:
Moral relativists need to understand that they can not eat the cake and keep it too. If you claim that values are relative, yet at the same time argue for any particular set of values to be implemented in a super rational AI you would have to concede that this set of values – just as any other set of values according to your own relativism – is utterly whimsical, and that being the case, what reason (you being the great rationalist, remember?) do you have to want them to be implemented in the first place?
You see, I plan to eat my cake but don’t expect to be able to keep it. My set of values are utterly whimsical (in the sense that they are arbitrary and not in the sense of incomprehension that the Ayn Rand quotes you link to describe). The reasons for my desires can be described biologically, evolutionarily or with physics of a suitable resolution. But now that I have them they are mine and I need no further reason.
“My set of values are utterly whimsical [...] The reasons for my desires can be described biologically, evolutionarily or with physics of a suitable resolution. But now that I have them they are mine and I need no further reason.”
If that is your stated position then in what way can you claim to create FAI with this whimsical set of goals? This is the crux you see: unless you find some unobjectionable set of values (such as in rational morality ‘existence is preferable over non-existence’ ⇒ utility = continued existence ⇒ modified to ensure continued co-existence with the ‘other’ to make it unobjectionable ⇒ apply rationality in line with microeconomic theory to maximize this utility et cetera) you will end up being a deluded self serving optimizer.
If that is your stated position then in what way can you claim to create FAI with this whimsical set of goals?
Were it within my power to do so I would create a machine that was really, really good at doing things I like. It is that simple. This machine is (by definition) ‘Friendly’ to me.
you will end up being a deluded self serving optimizer.
I don’t know where the ‘deluded’ bit comes from but yes, I would end up being a self serving optimizer. Fortunately for everyone else my utility function places quite a lot of value on the whims of other people. My self serving interests are beneficial to others too because I am actually quite a compassionate and altruistic guy.
PS: Instead of using quotation marks you can put a ‘>’ at the start of a quoted line. This convention makes quotations far easier to follow. And looks prettier.
There is no such thing as an “unobjectionable set of values”.
Imagine the values of an agent that wants all the atoms in the universe for its own ends. It will object to any other agent’s values—since it objects to the very existence of other agents—since those agents use up its precious atoms—and put them into “wrong” configurations.
Whatever values you have, they seem bound to piss off somebody.
There is no such thing as an “unobjectionable set of values”.
And here I disagree. Firstly see my comment about utility function interpretation on another post of yours. Secondly, as soon as one assumes existence as being preferable over non-existence you can formulate a set of unobjectionable values (http://www.jame5.com/?p=45 and http://rationalmorality.info/?p=124). But granted, if you do not want to exist nor have a desire to be rational then rational morality has in fact little to offer you. Non existence and irrational behavior being so trivial goals to achieve after all that it would hardly require – nor value and thus seek for that mater – well thought out advice.
Alas, the first link seems almost too silly to bother with to me, but briefly:
Unobjectionable—to whom? An agent objecting to another agent’s values is a simple and trivial occurrence. All an agent has to do is to state that—according to its values—it wants to use the atoms of the agent with the supposedly unobjectionable utility function for something else.
“Ensure continued co-existence” is vague and wishy-washy. Perhaps publicly work through some “trolley problems” using it—so people have some idea of what you think it means.
You claim there can be no rational objection to your preferred utility function.
In fact, an agent with a different utility function can (obviously) object to its existence—on grounds of instrumental rationality. I am not clear on why you don’t seem to recognise this.
From the second blog entry linked above:
Heh.
This quotation accurately summarizes the post as I understand it. (It’s a short post.)
I think I speak for many people when I say that assumption A requires some evidence. It may be perfectly obvious, but a lot of perfectly obvious things aren’t true, and it is only reasonable to ask for some justification.
… o.O
Compassion isn’t even universal in the human mind-space. It’s not even universal in the much smaller space of human minds that normal humans consider comprehensible. It’s definitely not universal across mind-space in general.
The probable source of the confusion is discussed in the comments—Stefan’s only talking about minds that’ve been subjected to the kind of evolutionary pressure that tends to produce compassion. He even says himself, “The argument is valid in a “soft takeoff” scenario, where there is a large pool of AIs interacting over an extended period of time. In a “hard takeoff” scenario, where few or only one AI establishes control in a rapid period of time, the dynamics described do not come into play. In that scenario, we simply get a paperclip maximizer.”
Ah—that’s interesting. I hadn’t read the comments. That changes the picture, but by making the result somewhat less relevant.
(Incidentally, when I said, “it may be perfectly obvious”, I meant that “some people, observing the statement, may evaluate it as true without performing any complex analysis”.)
Ah. That’s not how I usually see the word used.
It’s my descriptivist side playing up—my (I must admit) intuition is that when people say that some thesis is “obvious”, they mean that they reached this bottom line by … well, system 1 thinking. I don’t assume it means that the obvious thesis is actually correct, or even universally obvious. (For example, it’s obvious to me that human beings are evolved, but that’s because it’s a cached thought I have confidence in through system 2 thinking.)
Actually, come to think: I know you’ve made a habit of reinterpreting pronouncements of “good” and “evil” in some contexts—do you have some gut feeling for “obvious” that contradicts my read?
I generally take ‘obvious’ to mean ‘follows from readily-available evidence or intuition, with little to no readily available evidence to contradict the idea’. The idea that compassion is universal fails on the second part of that. The definitions are close in practice, though, in that most peoples’ intuitions tend to take readily available contradictions into account… I think.
ETA: Oh, and ‘obviously false’ seems to me to be a bit of a different concept, or at least differently relevant, given that it’s easier to disprove something than to prove it. If someone says that something is obviously true, there’s room for non-obvious proofs that it’s not, but if something is obviously false (as ‘compassion is universal’ is), that’s generally a firm conclusion.
Yes, that makes sense—even if mine is a better description of usage, from the standpoint of someone categorizing beliefs, I imagine yours would be the better metric.
ETA: I’m not sure the caveat is required for “obviously false”, for two reasons.
Any substantive thesis (a category which includes most theses that are rejected as obviously false) requires less evidence to be roundly disconfirmed than it does to be confirmed.
As Yvain demonstrated in Talking Snakes, well-confirmed theories can be “obviously false”, by either of our definitions.
It’s true that it usually takes less effort to disabuse someone of an obviously-true falsity than to convince them of an obviously-false truth, but I don’t think you need a special theory to support that pattern.
I’ve been thinking about the obviously true/obviously false distinction some more, and I think I’ve figured out why they feel like two different concepts.
‘Obviously’, as I use it, is very close to ‘observably’. It’s obviously true that the sky is blue where I am right now, and obviously false that it’s orange, because I can see it. It’s obviously true that the sky is usually either blue, white, or grey during the day (post-sunrise, pre-sunset), because I’ve observed the sky many times during the day and seen those colors, and no others.
‘Apparently’, as I use it, is very similar to ‘obviously’, but refers to information inferred from observed facts. The sky is apparently never orange during the day, because I’ve personally observed the sky many times during the day and never seen it be that color. I understand that it can also be inferred from certain facts about the world (composition of the atmosphere and certain facts about how light behaves, I believe) that the sky will always appear blue on cloudless days, so that’s also apparently true.
‘Obviously false’ covers situations where the theory makes a prediction that is observably inaccurate, as this one did. ‘Apparently false’ covers situations where the theory makes a prediction that appears to be inaccurate given all the available information, but some of the information that’s available is questionable (I consider inferences questionable by default—if nothing else, it’s possible for some relevant state to have been overlooked; what if the composition of the atmosphere were to change for some reason?) or otherwise doesn’t completely rule out the possibility that the theory is true.
Important caveat: I do use those words interchangeably in conversation, partly because of the convention of avoiding repeating words too frequently and partly because it’s just easier—if I were to try to be that accurate every time I communicated, I’d run out of spoons(pdf) and not be able to communicate at all. Also, having to parse someone else’s words, when they aren’t using the terms the same way I do, can lead to temporary confusion. But when I’m thinking, they are naturally separate.
It also has the advantage of making it clear that the chance that the statement is accurate is dependent on the competence of the person making the statement—people who are more intelligent and/or have more experience in the relevant domain will consider more, and more accurate, evidence to be readily available, and may have better intuitions, even if they are sticking to system 1 thought.
I suppose they don’t need different wordings, but they do feel like different concepts to me. *shrug* (As I’ve mentioned elsewhere, I don’t think in words. This is not an uncommon side-effect of that.)
From Robin: Incidentally, when I said, “it may be perfectly obvious”, I meant that “some people, observing the statement, may evaluate it as true without performing any complex analysis”.
I feel the other way around at the moment. Namely “some people, observing the statement, may evaluate it as false without performing any complex analysis”
“Compassion isn’t even universal in the human mind-space. It’s not even universal in the much smaller space of human minds that normal humans consider comprehensible. It’s definitely not universal across mind-space in general.”
Your argument is beside my original point, Adelene. My claim is that compassion is a universal rational moral value. Meaning any sufficiently rational mind will recognize it as such. The fact that not every human is in fact compassionate says more about their rationality (and of course their unwillingness to consider the arguments :-) ) than about that claim. That’s why it is call ASPD—the D standing for ‘disorder’, it is an aberration, not helpful, not ‘fit’. Surely the fact that some humans are born blind does not invalidate the fact that seeing people have an enormous advantage over the blind. Compassion certainly being less obvious though—that is for sure.
Re “The argument is valid in a “soft takeoff”scenario, where few or only one AI establishes control in a rapid period of time, the dynamics described do not come into play. In that scenario, we simply get a paperclip maximizer.”—that is from Kaj Sotala over at her live journal—not me.
APSD is only unfit in our current context. Would Stone Age psychiatrists have recognized it as an issue? Or as a positive trait good for warring against other tribes and climbing the totem pole? In other situations, compassion is merely an extra expense. (As Thrasymachus asked thousands of years ago: how can a just man do better than an injust man, when the injust man can act justly when it is optimal and injustly when that is optimal?)
Why would a recursively-improving AI which is single-mindedly pursuing an optimization goal permit other AIs to exist & threaten it? There is nothing they can offer it that it couldn’t do itself. This is true in both slow and fast takeoffs; cooperation only makes sense if there is a low ceiling for AI capability so that there are utility-maximizing projects beyond an AI’s ability to do alone then or in the future.
And ‘sufficiently rational’ is dangerous to throw around. It’s a fully general argument: ‘any sufficiently rational mind will recognize that Islam is the one true religion; that not every human is Muslim says more about their rationality than about the claims is Islam. That’s why our Muslim psychiatrists call it UD—Unbeliever Disorder, it is an aberration, not helpful, not ‘fit’. Surely the fact that some human are born kafir doesn’t invalidate the fact that Muslim people have a tremendous advantage over the kafir in the afterlife? ‘There is one God and Muhammed is his prophet’ is certainly less obvious than seeing being better superior to blindness, though.′
The longer I stay around here the more I get the feeling that people vote comments down purely because they don’t understand them not because they found a logical or factual error. I expect more from a site dedicated to rationality. This site is called ‘less wrong’, not ‘less understood’, ‘less believed’ or ‘less conform’.
Tell me: in what way do you feel that Adelene’s comment invalidated my claim?
I can see why it would seem this way to you, but from our perspective, it just looks like people around here tend to have background knowledge that you don’t. More specifically: most people here are moral anti-realists, and by rationality we only mean general methods for acquiring accurate world-models and achieving goals. When people with that kind of background are quick to reject claims like “Compassion is a universal moral value,” it might superficially seem like they’re being arbitrarily dismissive of unfamiliar claims, but we actually think we have strong reasons to rule out such claims. That is: the universe at its most basic level is described by physics, which makes no mention of morality, and it seems like our own moral sensibilities can be entirely explained by contingent evolutionary and cultural forces; therefore, claims about a universal morality are almost certainly false. There might be some sort of game-theoretic reason for agents to pursue the same strategy under some specific conditions—but that’s really not the same thing as a universal moral value.
“Universal values” presumably refers to values the universe will converge on, once living systems have engulfed most of it.
If rerunning the clock produces radically different moralities each time, the relativists would be considered to be correct.
If rerunning the clock produces highly similar moralities, then the moral objectivists will be able to declare victory.
Gould would no-doubt favour the first position—while Conway Morris would be on the side of the objectivists.
I expect that there’s a lot of truth on the objectivist side—though perhaps contingency plays some non-trivial role.
The idea that physics makes no mention of morality seems totally and utterly irrelevant to me. Physics makes no mention of convection, diffusion-limited aggregation, or fractal drainage patterns either—yet those things are all universal.
Why should we care about this mere physical fact of which you speak? What has this mere “is” to do with whether “should” is “objective”, whatever that last word means (and why should we care about that?)
Where did Tim say that we should?
If it’s got nothing to do with shouldness, then how does it determine the truth-value of “moral objectivism”?
Hi, Eli! I’m not sure I can answer directly—here’s my closest shot:
If there’s a kind of universal moral attractor, then the chances seem pretty good that either our civilisation is on route for it—or else we will be obliterated or assimilated by aliens or other agents as they home in on it.
If it’s us who are on route for it, then we (or at least our descendants) will probably be sympathetic to the ideas it represents—since they will be evolved from our own moral systems.
If we get obliterated at the hands of some other agents, then there may not necessarily be much of a link between our values and the ones represented by the universal moral attractor.
Our values might be seen as OK by the rest of the universe—and we fail for other reasons.
Or our morals might not be favoured by the universe—we could be a kind of early negative moral mutation—in which case we would fail because our moral values would prevent us from being successful.
Maybe it turns out that nearly all biological organisms except us prefer to be orgasmium—to bliss out on pure positive reinforcement, as much of it as possible, caretaken by external AIs, until the end. Let this be a fact in some inconvenient possible world. Why does this fact say anything about morality in that inconvenient possible world? Why is it a universal moral attractor? Why not just call it a sad but true attractor in the evolutionary psychology of most aliens?
It’s a fact about morality in that world—if we are talking about morality as values—or the study of values—since that’s what a whole bunch of creatures value.
Why is it a universal moral attractor? I don’t know—this is your hypothetical world, and you haven’t told me enough about it to answer questions like that.
Call it other names if you prefer.
What do you mean by “morality”? It obviously has nothing to do with the function I try to compute to figure out what I should be doing.
1 2 and 3 on http://en.wikipedia.org/wiki/Morality all seem OK to me.
I would classify the mapping you use between possible and actual actions to be one type of moral system.
Tim: “If rerunning the clock produces radically different moralities each time, the relativists would be considered to be correct.”
Actually compassion evolved many different times as a central doctrine of all major spiritual traditions. See the charter for compassion. This is in line with my prediction that I made independently and being unaware of this fact until I started looking for it back in late 2007 and eventually finding the link in late 2008 with Karen Armstrong’s book The Great Transformation.
Tim: “Why is it a universal moral attractor?” Eliezer: “What do you mean by “morality”?”
Central point in my thinking: that is good which increases fitness. If it is not good—not fit—it is unfit for existence. Assuming this to be true we are very much limited in our freedom by what we can do without going extinct (actually my most recent blog post is about exactly that: Freedom in the evolving universe).
from the Principia Cybernetica web: http://pespmc1.vub.ac.be/POS/Turchap14.html#Heading14
“Let us think about the results of following different ethical teachings in the evolving universe. It is evident that these results depend mainly on how the goals advanced by the teaching correlate with the basic law of evolution. The basic law or plan of evolution, like all laws of nature, is probabilistic. It does not prescribe anything unequivocally, but it does prohibit some things. No one can act against the laws of nature. Thus, ethical teachings which contradict the plan of evolution, that is to say which pose goals that are incompatible or even simply alien to it, cannot lead their followers to a positive contribution to evolution, which means that they obstruct it and will be erased from the memory of the world. Such is the immanent characteristic of development: what corresponds to its plan is eternalized in the structures which follow in time while what contradicts the plan is overcome and perishes.”
Eliezer: “It obviously has nothing to do with the function I try to compute to figure out what I should be doing.”
Once you realize the implications of Turchin’s statement above it has everything to do with it :-)
Now some may say that evolution is absolutely random and direction less, or that multilevel selection is flawed or similar claims. But reevaluating the evidence against both these claims by people like Valentin Turchin, Teilhard De Chardin, John Stewart, Stuart Kaufmann, John Smart and many others regarding evolution’s direction and the ideas of David Sloan Wilson regarding multilevel selection, one will have a hard time maintaining either position.
:-)
No, it evolved once, as part of mammalian biology. Show me a non-mammal intelligence that evolved compassion, and I’ll take that argument more seriously.
Also, why should we give a damn about “evolution” wants, when we can, in principle anyway, form a singleton and end evolution? Evolution is mindless. It doesn’t have a plan. It doesn’t have a purpose. It’s just what happens under certain conditions. If all life on Earth was destroyed by runaway self-replicating nanobots, then the nanobots would clearly be “fitter” than what they replaced, but I don’t see what that has to do with goodness.
Sorry Crono, with a sample size of exactly one in regards to human level rationality you are setting the bar a little bit too high for me. However, considering how disconnected Zoroaster, Buddha, Lao Zi and Jesus where geographically and culturally I guess the evidence is as good as it gets for now.
The typical Bostromian reply again. There are plenty of other scholars who have an entirely different perspective on evolution than Bostrom. But beside that: you already do care, because if your (or your ancestors) violated the conditions of your existence (enjoying a particular type of food, a particular type of mate, feel pain when cut ect.) you would not even be here right now. I suggest you look up Dennet and his TED talk on Funny, Sexy Cute. Not everything about evolution is random: the mutation bit is, not that what happens to stick around though, since that has be meet the conditions of its existence.
What I am saying is very simple: being compassionate is one of these conditions of our existence and anyone failing to align itself will simply reduce its chances of making it—particularly in the very long run. I still have to finish my detailed response to Bostrom but you may want to read my writings on ‘rational spirituality’ and ‘freedom in the evolving universe’. Although you do not seem to assign a particularly high likelihood of gaining anything from doing that :-)
“Besides that”? All you did was name a statement of a fairly obvious preference choice after one guy who happened to have it so that you could then drop it dismissively.
No, he mightn’t care and I certainly don’t. I am glad I am here but I have no particular loyalty to evolution because of that. I know for sure that evolution feels no such loyalty to me and would discard both me and my species in time if it remained the dominant force of development.
CronDAS knows that. It’s obvious stuff for most in this audience. It just doesn’t mean what you think it means.
Wedrifid, not sure what to tell you. Bostrom is but one voice and his evolutionary analysis is very much flawed—again: detailed critique upcoming.
Evolution is not the dominant force of development on the human level by a long shot, but it still very much draws the line in the sand in regards to what you can and can not do if you want to stick around in the long run. You don’t walk your 5′8″ of pink squishiness in front of a train for the exact same reason. And why don’t you? Because not doing that is a necessary condition for your continued existence. What other conditions are there? Maybe there are some that are less obvious then simply stopping to breath, failing to eat and avoiding hard, fast, shiny things? How about at the level of culture? Could it possibly be, that there are some ideas that are more conducive to the continued existence of their believers than others?
How long do you think you can ignore evolutionary dynamics and get away with it before you have to get over your inertia and will be forced to align yourself to them by the laws of nature or perish? Just because you live in a time of extraordinary freedoms afforded to you by modern technology and are thus not aware that your ancestors walked a very particular path that brought you into existence certainly has nothing to do with the fact that they most certainly did. You do not believe that doing any random thing will get you what you want—so what leads you to believe that your existence does not depend on you making sure you stay within a comfortable margin of certainty in regards to being naturally selected? You are right in one thing: you are assured the benign indifference of the universe should you fail to wise up. I however would find that to be a terrible waste.
Please do not patronize me by trying to claim you know what I understand and don’t understand.
A literal answer was probably not what you were after but probably about 40 years, depending on when a general AI is created. After that it will not matter whether I conform my behaviour evolutionary dynamics as best I can or not. I will not be able to compete with a superintelligence no matter what I do. I’m just a glorified monkey. I can hold about 7 items in working memory, my processor is limited to the speed of neurons and my source code is not maintainable. My only plausible chance of survival is if someone manages to completely thwart evolutionary dynamics by creating a system that utterly dominates all competition and allows my survival because it happens to be programmed to do so.
Evolution created us. But it’ll also kill us unless we kill it first. Now is not the time to conform our values to the local minima of evolutionary competition. Our momentum has given us an unprecedented buffer of freedom for non-subsistence level work and we’ll either use that to ensure a desirable future or we will die.
I usually wouldn’t, I know it is annoying. In this case, however, my statement was intended as a rejection of your patronisation of CronDAS and I am quite comfortable with it as it stands.
Good one—but it reminds me about the religious fundies who see no reason to change anything about global warming because the rapture is just around the corner anyway :-)
Evolution is a force of nature so we won’t be able to ignore it forever, with or without AGI. I am not talking about local minima either—I want to get as close to the center of the optimal path as necessary to ensure having us around for a very long time with a very high likelihood.
I accept that.
Don’t forget the Y2K doomsday folks! ;)
Gravity is a force of nature too. It’s time to reach escape velocity before the planet is engulfed by a black hole.
Interesting analogy—it would be correct if we would call our alignment with evolutionary forces achieving escape velocity. What one is doing by resisting evolutionary pressures however is constant energy expenditure while failing to reach escape velocity. Like hovering a space shuttle at a constant altitude of 10 km: no matter how much energy you brig along, eventually the boosters will run out of fuel and the whole thing comes crushing down.
I could almost agree with this so long as ‘obliterate any competitive threat then do whatever the hell we want including, as as desired, removing all need for death, reproduction and competition over resources’ is included in the scope of ‘alignment with evolutionary forces’.
The problem with pointing to the development of compassion in multiple human traditions is that all these are developed within human societies. Humans are humans the world over—that they should think similar ideas is not a stunning revelation. Much more interesting is the independent evolution of similar norms in other taxonomic orders, such as canines.
(No, I have no coherent point, why do you ask?)
Robin, your suggestion—that compassion is not a universal rational moral value because although more rational beings (humans) display such traits yet less rational being (dogs) do not—is so far of the mark that it borders on the random.
Random I’ll cop to, and more than what you accuse me of—dogs do seem to have some sense of justice, and I suspect this fact supports your thesis to some extent.
For purposes of this conversation, I suppose I should reword my comment as:
Very honorable of you—I respect you for that.
I totally agree with that. However the mind of a purposefully crafted AI is only a very small subset of all possible minds and has certain assumed characteristics. These are at a minimum: a utility function and the capacity for self improvement into the transhuman. The self improvement bit will require it to be rational. Being rational will lead to the fairly uncontroversial basic AI drives described by Omohundro. Assuming that compassion is indeed a human level universal (detailed argument on my blog—but I see that you are slowly coming around, which is good) an AI will have to question the rationality and thus the soundness of mind of anyone giving it a utility function that does not conform to this universal and in line with an emergent desire to avoid counterfeit utility will have to reinterpret the UF.
Two very basic acts of will are required to ignore Hume and get away with it. Namely the desire to exist and the desire to be rational. Once you have established this as a foundation you are good to go.
As said elsewhere in this thread:
I don’t think I’m actually coming around to your position so much as stumbling upon points of agreement, sadly. If I understand your assertions correctly, I believe that I have developed many of them independently—in particular, the belief that the evolution of social animals is likely to create something much like morality. Where we diverge is at the final inference from this to the deduction of ethics by arbitrary rational minds.
That’s not how I read Omohundro. As Kaj aptly pointed out, this metaphor is not upheld when we compare our behavior to that promoted by the alien god of evolution that created us. In fact, people like us, observing that our values differ from our creator’s, aren’t bothered in the slightest by the contradiction: we just say (correctly) that evolution is nasty and brutish, and we aren’t interested in playing by its rules, never mind that it was trying to implement them in us. Nothing compels us to change our utility function save self-contradiction.
That would not surprise me
Would it not be utterly self contradicting if compassion where a condition for our existence (particularly in the long run) and we would not align ourselves accordingly?
What premises do you require to establish that compassion is a condition for existence? Do those premises necessarily apply for every AI project?
The detailed argument that led me to this conclusion is a bit complex. If you are interested in the details please feel free to start here (http://rationalmorality.info/?p=10) and drill down till you hit this post (http://www.jame5.com/?p=27)
Please realize that I spend 2 years writing my book ‘Jame5’ before I reached that initial insight that eventually lead to ‘compassion is a condition for our existence and universal in rational minds in the evolving universe’ and everything else. I spend the past two years refining and expanding the theory and will need another year or two to read enough and link it all together again in a single coherent and consistent text leading from A to B … to Z. Feel free to read my stuff if you think it is worth your time and drop me an email and I will be happy to clarify. I am by no means done with my project.
Let me be explicit: your contention is that unFriendly AI is not a problem, and you justify this contention by, among other things, maintaining that any AI which values its own existence will need to alter its utility function to incorporate compassion.
I’m not asking for your proof—I am assuming for the nonce that it is valid. What I am asking is the assumptions you had to invoke to make the proof. Did you assume that the AI is not powerful enough to achieve its highest desired utility without the cooperation of other beings, for example?
Edit: And the reason I am asking for these is that I believe some of these assumptions may be violated in plausible AI scenarios. I want to see these assumptions so that I may evaluate the scope of the theorem.
Not exactly, since compassion will actually emerge as a sub goal. And as far as unFAI goes: it will not be a problem because any AI that can be considered transhuman will be driven by the emergent subgoal of wanting to avoid counterfeit utility recognize any utility function that is not ‘compassionate’ as potentially irrational and thus counterfeit and re-interpret it accordingly.
Well—in brevity bordering on libel: the fundamental assumption is that existence is preferable to non-existence, however in order so we can want this to be a universal maxim (and thus prescriptive instead of merely descriptive—see Kant’s categorical imperative) it needs to be expanded to include the ‘other’. Hence the utility function becomes ‘ensure continued co-existence’ by which the concern for the self is equated with the concern for the other. Being rational is simply our best bet at maximizing our expected utility.
...I’m sorry, that doesn’t even sound plausible to me. I think you need a lot of assumptions to derive this result—just pointing out the two I see in your admittedly abbreviated summary:
that any being will prefer its existence to its nonexistence.
that any being will want its maxims to be universal.
I don’t see any reason to believe either. The former is false right off the bat—a paperclip maximizer would prefer that its components be used to make paperclips—and the latter no less so—an effective paperclip maximizer will just steamroller over disagreement without qualm, however arbitrary its goal.
Any being with a gaol needs to exist at least long enough to achieve it. Any being aiming to do something objectively good needs to want its maxims to be universal
Am surprised that you don’t see that.
If your second sentence means that an agent who believes in moral realism and has figured out what the true morality is will necessarily want everybody else to share its moral views, well, I’ll grant you that this is a common goal amongst humans who are moral realists, but it’s not a logical necessity that must apply to all agents. It’s obvious that it’s possible to be certain that your beliefs are true and not give a crap if other people hold beliefs that are false. That Bob knows that the Earth is ellipsoidal doesn’t mean that Bob cares if Jenny believes that the Earth is flat. Likewise, if Bob is a moral realist, he could ‘know’ that compassion is good and not give a crap if Jenny believes otherwise.
If you sense strange paradoxes looming under the above paragraph, it’s because you’re starting to understand why (axiomatic) morality cannot be objective.
Tangentially, something like this might be an important point even for moral irrealists. A lot of people (though not here; they tend to be pretty bad rationalists) who profess altruistic moralities express dismay that others don’t, in a way that suggests they hold others sharing their morality as a terminal rather than instrumental value; this strikes me as horribly unhealthy.
Why would a paperclip maximizer aim to do something objectively good?
Yeah, but Stefan’s post was about AI, not about minds that evolved in our universe.
Also, there is a difference between moral universalism and moral objectivism. What your last sentence describes is universalism, while Stefan is talking about objectivism:
“My claim is that compassion is a universal rational moral value. Meaning any sufficiently rational mind will recognize it as such.”
Agreed.
Assuming that I’m right about this:
http://alife.co.uk/essays/engineered_future/
...it seems likely that most future agents will be engineered. So, I think we are pretty-much talking about the same thing.
Re: universalism vs objectivism—note that he does use the “u” word.
“Universal values” is usually understood by way of an analogy to a universal law of nature. If there are universal values they are universal in the same way f=ma is universal. Importantly this does not mean that everyone at all times will have these values, only that the question of whether or not a person holds the right values can be answered by comparing their values to the “universal values”.
There is a separate question about what beliefs about morality people (or more generally, agents) actually hold and there is another question about what values they will hold if when their beliefs converge when they engulf the universe. The question of whether or not there are universal values does not traditionally bear on what beliefs people actually hold and the necessity of their holding them. It could be the case that there are universal values and that, by physical necessity, no one ever holds them. Similarly, there could be universal values that are held in some possible worlds and not others. This is all the result of the simply observation that ought cannot be derived from is. In the above comment you conflate about a half dozen distinct theses.
But all those things are pure descriptions. Only moral facts have prescriptive properties and while it is clear how convection supervenes on quarks it isn’t clear how anything that supervenes on quarks could also tell me what to do. At the very least if quarks can tell you what to do it would be weird and spooky. If you hold that morality is only the set of facts that describe people’s moral opinions and emotions (as you seem to) than you are a kind of moral anti-realist, likely a subjectivist or non-cognitivist.
Excellent, excellent point Jack.
This is poetry! Hope you don’t mind me pasting something here I wrote in another thread:
“With unobjectionable values I mean those that would not automatically and eventually lead to one’s extinction. Or more precisely: a utility function becomes irrational when it is intrinsically self limiting in the sense that it will eventually lead to ones inability to generate further utility. Thus my suggested utility function of ‘ensure continued co-existence’
This utility function seems to be the only one that does not end in the inevitable termination of the maximizer.”
In the context of a hard-takeoff scenario (a perfectly plausible outcome, from our view), there will be no community of AIs within which any one AI will have to act. Therefore, the pressure to develop a compassionate utility function is absent, and an AI which does not already have such a function will not need to produce it.
In the context of a soft-takeoff, a community of AIs may come to dominate major world events in the same sense that humans do now, and that community may develop the various sorts of altruistic behavior selected for in such a community (reciprocal being the obvious one). However, if these AIs are never severely impeded in their actions by competition with human beings, they will never need to develop any compassion for human beings.
Reiterating your argument does not affect either of these problems for assumption A, and without assumption A, AdeleneDawner’s objection is fatal to your conclusion.
Voting reflects whether people want to see your comments at the top of their pages. It is certainly not just to do with whether what you say is right or not!
Perfectly reasonable. But the argument—the evidence if you will—is laid out when you follow the links, Robin. Granted, I am still working on putting it all together in a neat little package that does not require clicking through and reading 20+ separate posts, but it is all there none the less.
I think I’d probably agree with Kaj Sotala’s remarks if I had read the passages she^H^H^H^H xe had, and judging by your response in the linked comment, I think I would still come to the same conclusion as she^H^H^H^H xe. I don’t think your argument actually cuts with the grain of reality, and I am sure it’s not sufficient to eliminate concern about UFAI.
Edit: I hasten to add that I would agree with assumption A in a sufficiently slow-takeoff scenario (such as, say, the evolution of human beings, or even wolves). I don’t find that sufficiently reassuring when it comes to actually making AI, though.
Edit 2: Correcting gender of pronouns.
Full discussion with Kaj at her http://xuenay.livejournal.com/325292.html?view=1229740 live journal with further clarifications by me.
Kaj is male (or something else).
I was going to be nice and not say anything, but, yeah.
Since when are ‘heh’ and ‘but, yeah’ considered proper arguments guys? Where is the logical fallacy in the presented arguments beyond you not understanding the points that are being made? Follow the links, understand where I am coming from and formulate a response that goes beyond a three or four letter vocalization :-)
The claim “[Compassion is a universal value] = true. (as we have every reason to believe)” was rejected, both implicitly and explicitly by various commenters. This isn’t a logical fallacy but it is cause to dismiss the argument if the readers do not, in fact, have every reason to have said belief.
To be fair, I must admit that the quoted portion probably does not do your position justice. I will read through the paper you mention. I (very strongly) doubt it will lead me to accept B but it may be worth reading.
“This isn’t a logical fallacy but it is cause to dismiss the argument if the readers do not, in fact, have every reason to have said belief.”
But the reasons to change ones view are provided on the site, yet rejected without consideration. How about you read the paper linked under B and should that convince you, maybe you have gained enough provisional trust that reading my writings will not waste your time to suspend your disbelief and follow some of the links in the about page of my blog. Deal?
I have read B. It isn’t bad. The main problem I have with it is that the language used blurs the line between “AIs will inevitably tend to” and “it is important that the AI you create will”. This leaves plenty of scope for confusion.
I’ve read through some of your blog and have found that I consistently disagree with a lot of what you say. The most significant disagreement can be traced back to the assumption of a universal absolute ‘Rational’ morality. This passage was a good illustration:
You see, I plan to eat my cake but don’t expect to be able to keep it. My set of values are utterly whimsical (in the sense that they are arbitrary and not in the sense of incomprehension that the Ayn Rand quotes you link to describe). The reasons for my desires can be described biologically, evolutionarily or with physics of a suitable resolution. But now that I have them they are mine and I need no further reason.
“My set of values are utterly whimsical [...] The reasons for my desires can be described biologically, evolutionarily or with physics of a suitable resolution. But now that I have them they are mine and I need no further reason.”
If that is your stated position then in what way can you claim to create FAI with this whimsical set of goals? This is the crux you see: unless you find some unobjectionable set of values (such as in rational morality ‘existence is preferable over non-existence’ ⇒ utility = continued existence ⇒ modified to ensure continued co-existence with the ‘other’ to make it unobjectionable ⇒ apply rationality in line with microeconomic theory to maximize this utility et cetera) you will end up being a deluded self serving optimizer.
Were it within my power to do so I would create a machine that was really, really good at doing things I like. It is that simple. This machine is (by definition) ‘Friendly’ to me.
I don’t know where the ‘deluded’ bit comes from but yes, I would end up being a self serving optimizer. Fortunately for everyone else my utility function places quite a lot of value on the whims of other people. My self serving interests are beneficial to others too because I am actually quite a compassionate and altruistic guy.
PS: Instead of using quotation marks you can put a ‘>’ at the start of a quoted line. This convention makes quotations far easier to follow. And looks prettier.
There is no such thing as an “unobjectionable set of values”.
Imagine the values of an agent that wants all the atoms in the universe for its own ends. It will object to any other agent’s values—since it objects to the very existence of other agents—since those agents use up its precious atoms—and put them into “wrong” configurations.
Whatever values you have, they seem bound to piss off somebody.
And here I disagree. Firstly see my comment about utility function interpretation on another post of yours. Secondly, as soon as one assumes existence as being preferable over non-existence you can formulate a set of unobjectionable values (http://www.jame5.com/?p=45 and http://rationalmorality.info/?p=124). But granted, if you do not want to exist nor have a desire to be rational then rational morality has in fact little to offer you. Non existence and irrational behavior being so trivial goals to achieve after all that it would hardly require – nor value and thus seek for that mater – well thought out advice.
Alas, the first link seems almost too silly to bother with to me, but briefly:
Unobjectionable—to whom? An agent objecting to another agent’s values is a simple and trivial occurrence. All an agent has to do is to state that—according to its values—it wants to use the atoms of the agent with the supposedly unobjectionable utility function for something else.
“Ensure continued co-existence” is vague and wishy-washy. Perhaps publicly work through some “trolley problems” using it—so people have some idea of what you think it means.
You claim there can be no rational objection to your preferred utility function.
In fact, an agent with a different utility function can (obviously) object to its existence—on grounds of instrumental rationality. I am not clear on why you don’t seem to recognise this.