It is just me, or are things getting a bit unfriendly around here?
Anyway...
Wiring up the AI to maximise happy faces etc. is not a very good idea, the goal is clearly too shallow to reflect the underlying intent. I’d have to read more of Hibbard’s stuff to properly understand his position, however.
That said, I do agree with a more basic underlying theme that he seems to be putting forward. In my opinion, a key, perhaps even THE key to intelligence is the ability to form reliable deep abstractions. In Solomonoff induction and AIXI you see this being driving by the Kolmogorov compressor, in the brain the neocortical hierarchy seems to be key. Furthermore, if you adopt the perspective I’ve taken on intelligence (i.e. the universal intelligence measure) you see that the reverse implication is true: intelligence actually requires the ability to form deep abstractions. In which case, a super intelligent machine must have the ability to form very deep and reliable abstractions about the world. Such a machine could still try to turn the world into happy faces, if this was its goal. However, it wouldn’t do this by accident because its ability to form abstractions was so badly flawed that it doesn’t differentiate between smiling faces and happy people. It’s not that stupid. Note that this goes for forming powerful abstractions in general, not just human things like happiness and faces.
What if it doesn’t care about happiness or smiles or any other abstractions that we
value? A super-intelligence isn’t an unlimited intelligence, i.e. it would still have to choose what to think about.
I think the point is that if you accept this definition of intelligence, i.e. that it requires the ability to form deep and reliable abstractions about the world, then it doesn’t make sense to talk about any intelligence (let alone a super one) being unable to differentiate between smiley-faces and happy people. It isn’t a matter, at least in this instance, of whether it cares to make that differentiation or not. If it is intelligent, it will make the distinction. It may have values that would be unrecognizable or abhorrent to humans, and I suppose that (as Shane_Legg noted) it can’t be ruled out that such values might lead it to tile the universe with smiley-faces, but such an outcome would have to be the result of something other than a mistake. In other words, if it really is “that stupid,” it fails in a number of other ways long before it has a chance to make this particular error.
It may not make sense to talk about a superintelligence that’s too dumb to understand human values, but it does make sense to talk about an AI smart enough to program superior general intelligences that’s too dumb to understand human values. If the first such AIs (‘seed AIs’) are built before we’ve solved this family of problems, then the intelligence explosion thesis suggests that it will probably be too late. You could ask an AI to solve the problem of FAI for us, but it would need to be an AI smart enough to complete that task reliably yet too dumb (or too well-boxed) to be dangerous.
Thanks for the reply, Robb. I’ve read your post and a good deal of the discussion surrounding it.
I think I understand the general concern, that an AI that either doesn’t understand or care about our values could pose a grave threat to humanity. This is true on its face, in the broad sense that any significant technological advance carries with it unforeseen (and therefore potentially negative) consequences. If, however, the intelligence explosion thesis is correct, then we may be too late anyway. I’ll elaborate on that in a moment.
First, though, I’m not sure I see how an AI “too dumb to understand human values” could program a superior general intelligence (i.e. an AI that is smart enough to understand human values). Even so, assuming it is possible, and assuming it could happen on a timescale and in such a way as to preclude or make irrelevant any human intervention, why would that change the nature of the superior intelligence from being, say, friendly to human interests, to being hostile to them? Why, for that matter, would any superintelligence (that understands human values, and that is “able to form deep and reliable abstractions about the world”) be predisposed to any particular position vis-a-vis humans? And even if it were predisposed toward friendliness, how could we possibly guarantee it would always remain so? How, that is, having once made a friend, can we foolproof ourselves against betrayal? My intuition is that we can’t. No step can be taken without some measure of risk, however small, and if the step has potentially infinitely negative consequences, then even the very slightest of risks begins to look like a bad bet. I don’t know a way around that math.
The genie, as you say, doesn’t care. But also, often enough, the human doesn’t care. He is constrained, of course, by his fellow humans, and by his environment, but he sometimes still manages (sometimes alone, sometimes in groups) to sow massive horror among his fellows, sometimes even in the name of human values. Insanity, for instance, in humans, is always possible, and one definition of insanity might even be: behavior that contradicts, ignores or otherwise violates the values of normal human society. “Normal” here is variable, of course, for the simple reason that “human society” is also variable. That doesn’t stop us, however, from distinguishing, as we generally do, between the insane and the merely stupid, even if upon close inspection the lines begin to blur. Likewise, we occasionally witness—and very frequently we imagine (comic books!) - cases where a human is both super-intelligent and super-insane. The fear many people have with regard to strong AI (and it is perhaps well-grounded, or well-enough), is that it might be both super-intelligent and, at least as far as human values are concerned, super-insane. As an added bonus, and certainly if the intelligence explosion thesis is correct, it might also be unconstrained or, ultimately, unconstrainable. On this much I think we agree, and I assume the goal of FAI is precisely to find the appropriate constraints.
Back now, though, to the question of “too late.” The family of problems you propose to solve before the first so-called seed AIs are built include, if I understand you correctly, a formal definition of human values. I doubt very much that such a solution is possible—and “never” surely won’t help us any more than “too late”—but what would the discovery of (or failure to discover) such a solution have to do with a mistake such as tiling the universe with smiley-faces (which seems to me much more a semantic error than an error in value judgment)? If we define our terms—and I don’t know any definition of intelligence that would allow the universe-tiling behavior to be called intelligent—then smiley faces may still be a risk, but they are not a risk of intelligent behavior. They are one way the project could conceivably fail, but they are not an intelligent failure.
On the other hand, the formal-definition-of-human-values problem is related to the smiley faces problem in another way: any hard-coded solution could lead to a universe of bad definitions and false equivalencies (smiles taken for happiness). Not because the AI would make a mistake, but because human values are neither fixed nor general nor permanent: to fix them (in code), and then propagate them on the enormous scale the intelligence explosion thesis suggests, might well lead to some kind of funneling effect, perhaps very quickly, perhaps over a long period of time, that produces, effectively, an infinity of smiley faces. In other words, to reduce an irreducible problem doesn’t actually solve it. For example, I value certain forms of individuality and certain forms of conformity, and at different times in my life I have valued other and even contradictory forms of individuality and other and even contradictory forms of conformity. I might even, today, call certain of my old individualistic values conformist values, and vice-versa, and not strictly because I know more today than I knew then. I am, today, quite differently situated in the world than I was, say, twenty years ago; I may even be said to be somewhat of a different person (and yet still the same); and around me the world itself has also changed. Now, these changes, these changing and contradictory values may or may not be the most important ones, but how could they be formalized, even conceptually? There is nothing necessary about them. They might have gone the other way around. They might not have changed at all. A person can value change and stability at the same time, and not only because he has a fuzzy sense of what those concepts mean. A person can also have a very clear idea of what certain concepts mean, and those concepts may still fail to describe reality. They do fail, actually, necessarily, which doesn’t make them useless—not at all—but knowledge of this failure should at least make us wary of the claims we produce on their behalf.
What am I saying? Basically, that the pre-seed hard-coding path to FAI looks pretty hopeless. If strong AI is inevitable, then yes, we must do everything in our power to make it friendly; but what exactly is in our power, if strong AI (which by definition means super-strong, and super-super-strong, etc.) is inevitable? If the risks associated with strong AI are as grave as you take them to be, does it really seem better to you (in terms of existential risk to the human race) for us to solve FAI—which is to say, to think we’ve solved it, since there would be no way of testing our solution “inside the box”—than to not solve strong AI at all? And if you believe that there is just no way to halt the progress toward strong AI (and super, and super-super), is that compatible with a belief that “this kind of progress” can be corralled into the relatively vague concept of “friendliness toward humans”?
Better stop there for the moment. I realize I’ve gone well outside the scope of your comment, but looking back through some of the discussion raised by your original post, I found I had more to say/think about than I expected. None of the questions here are meant to be strictly rhetorical, a lot of this is just musing, so please respond (or not) to whatever interests you.
but it does make sense to talk about an AI smart enough to program superior general intelligences that’s too dumb to understand human values
Superior to what? If they are only as smart as the average person, then all things being equal, they will be as good as the average peson as figuring out morality. If they are smarter, they will be better, You seem to be tacitly assuming that the Seed AIs are designing walled-off unupdateable utility functions. But if one assumes a more natural architecture, where moral sense is allowed to evolve with eveythign else, you would expect and incremental succession of AIs to gradually get better at moral reasoning. And if it fooms, it’s moral reasoning will fomm along with eveything else, because you haven’t created an artificial problem by firewalling it off.
If they are only as smart as the average person, then all things being equal, they will be as good as the average peson as figuring out morality.
It’s quite possible that I’m below average, but I’m not terribly impressed by my own ability to extrapolate how other average people’s morality works—and that’s with the advantage of being built on hardware that’s designed toward empathy and shared values. I’m pretty confident I’m smarter than my cat, but it’s not evident that I’m correct when I guess at the cat’s moral system. I can be right, at times, but I can be wrong, too.
Worse, that seems a fairly common matter. There are several major political discussions involving moral matters, where it’s conceivable that at least 30% of the population has made an incorrect extrapolation, and probable that in excess of 60% has. And this only gets worse if you consider a time variant : someone who was as smart as the average individual in 1950 would have little problem doing some very unpleasant things to Alan Turing. Society (luckily!) developed since then, but it has mechanisms for development and disposal of concepts that AI do not necessarily have or we may not want them to have.
((This is in addition to general concerns about the universality of intelligence : it’s not clear that the sort of intelligence used for scientific research necessarily overlaps with the sort of intelligence used for philosophy, even if it’s common in humans.))
You seem to be tacitly assuming that the Seed AIs are designing walled-off unupdateable utility functions. But if one assumes a more natural architecture, where moral sense is allowed to evolve with eveythign else, you would expect and incremental succession of AIs to gradually get better at moral reasoning
Well, the obvious problem with not walling off and making unupdateaable the utility function is that the simplest way to maximize the value of a malleable utility function is to update it to something very easy. If you tell an AI that you want it to make you happy, and let it update that utility function, it takes a good deal less bit-twiddling to define “happy” as a steadily increasing counter. If you’re /lucky/, that means your AI breaks down. If not, it’s (weakly) unfriendly.
You can have a higher-level utility function of “do what I mean”, but not only is that harder to define, it has to be walled off, or you have “what I mean” redirected to a steadily increasing counter. And so on and so forth through higher levels of abstraction.
If you were bad at figuring out morality , you would be in jail. I am not sure what you mean by other people’s morality: I find the idea that there can be multiple ,valid effective moralities in society incoherent- like an economy where everyone has their own currency. You are not in jail so you learnt morality.(You don’t seem to believe morality is entirely hardwired , because you regard it as varying across short spans of time)
I also don’t know what you mean by an incorrect eextrapolation. If morality is objective, then most people might be wrong about it. However, an .AI will not pose a threat unless it is worse than the prevailing standard...the absolute standard does not matter.
Why would an .AI dumb enough to believe in 1950s morality be powerful enough to impose its views on a society that knows better?
Why wuld a smart AI lack mechanisms for disposing of concepts? How it could it self improve without such a mechanism ? If it’s too dumb to update,why would it be a threat?
If there is no NGI, there is no AGI. If there is no AGI, there is no threat of AGI. The threat posed by specialised optimisers is quite different...they can be boxed off if they cannot speak.
The failure modes of updateable UFs are wireheading failure modes, not destroy the world failure modes.
If they are only as smart as the average person, then all things being equal, they will be as good as the average [person] as figuring out morality.
That’s not generally true of human-level intelligences. We wouldn’t expect a random alien species that happens to be as smart as humans to be very successful at figuring out human morality. It maybe true if the human-level AGI is an unmodified emulation of a human brain. But humans aren’t very good at figuring out morality; they can make serious mistakes, though admittedly not the same mistakes Eliezer gives as examples above. (He deliberately picked ones that sound ‘stupid’ to a human mind, to make the point that human concepts have a huge amount of implicit complexity built in.)
If they are smarter, they will be better,
Not necessarily. The average chimpanzee is better than the average human at predicting chimpanzee behavior, simulating chimpanzee values, etc. (See Sympathetic Minds.)
walled-off unupdateable utility functions.
Utility functions that change over time are more dangerous than stable ones, because it’s harder to predict how a descendant of a seed AI with a heavily modified utility function will behave than it is to predict how a descendant with the same utility function will behave.
you would expect [an] incremental succession of AIs to gradually get better at moral reasoning.
If we don’t solve the problem of Friendly AI ourselves, we won’t know what trajectory of self-modification to set the AI on in order for it to increasingly approximate Friendliness. We can’t tell it to increasingly approximate something that we ourselves cannot formalize and cannot point to clear empirical evidence of.
We already understand arithmetic, so we know how to reward a system for gradually doing better and better at arithmetic problems. We don’t understand human morality or desire, so we can’t design a Morality Test or Wish Test that we know for sure will reward all and only the good or desirable actions. We can make the AI increasingly approximate something, sure, but how do we know in advance that that something is something we’d like?
That’s not generally true of human-level intelligences. We wouldn’t expect a random alien species that happens to be as smart as humans to be very successful at figuring out human morality.
Assuming morality is lots of highly localised, different things...which I don’t , particularly. if it is not, then you can figure it out anywhere, If it is,then the problem the aliens have is not that morality is imponderable, but that they are don’t have access to the right data. They don’t know how things on earth. However, an AI built on Earth would. So the situation is not analogous. The only disadvantage an AI would have is not having biological drives itself, but it is not clear that an entity needs to have drives in order to understand them. We could expect a SIAI to get incrementally betyter at maths than us until it surpasses us; we wouldn’t worry that i would hit on the wrong maths, because maths is not a set of arbitrary, disconnected facts.
But humans aren’t very good at figuring out morality; they can make serious mistakes
An averagely intelligent AI with an average grasp of morality would not be more of a threat than an average human. A smart AI, would, all other things being equal, be better at figuring out morality. But all other things are not equal, because you want to create problems by walling off the UF.
(He deliberately picked ones that sound ‘stupid’ to a human mind, to make the point that human concepts have a huge amount of implicit complexity built in.)
I’m sure they do. That seems to be why progress in AGI , specifically use of natural language,has been achingly slow. But why should moral concepts be so much more difficult than others? An AI smart enough to talk its way out of a box would be able to understand the implicit complexity: an AI too dumb to understand implicit complexity would be boxable. Where is the problem?
Utility functions that change over time are more dangerous than stable ones, because it’s harder to predict how a descendant of a seed AI with a heavily modified utility function will behave than it is to predict how a descendant with the same utility function will behave.
Things are not inherently dangerous just because they are unpredictable. If you have some independent reason fo thinking something might turn dangerous, then it becomes desirable to predict it.
But Superintelligent artificial general intelligences are generally assumed to be good at everything: they are not assumed to develop mysterious blind spots about falconry or mining engineering, Why assume they will develop a blind spot about morality? Oh yes...because you have assumed from the outset that the UF must be walled off from self improvement...in order to be safe. You are only facing that particular failure mode because of something you decided on to be safe.
If we don’t solve the problem of Friendly AI ourselves, we won’t know what trajectory of self-modification to set the AI on in order for it to increasingly approximate Friendliness
The average person manages to solve the problem of being moral themselves, in a good-enough way. You keep assuming, without explanation that an AI can’t do the same.
We can’t tell it to increasingly approximate something that we ourselves cannot formalize and cannot point to clear empirical evidence of.
Why isn’t having a formalisation of morality a problem with humans? We know how humans incrementally improve as moral reasoners: it’s called the Kohlberg hierarchy.
We don’t understand human morality or desire, so we can’t design a Morality Test or Wish Test that we know for sure will reward all and only the good or desirable actions.
We don’t have perfect morality tests. We do have morality tests. Fail them and you get pilloried in the media or sent to jail.
We can make the AI increasingly approximate something, sure, but how do we know in advance that that something is something we’d like?
Again, you are assuming that morality is something highly local and arbitrary. If it works like arithmetic, that is if it is an expansion of some basic principles, then we can tell that is heading in the right direction by identifying that its reasoning is in line with those principles.
Assuming morality is lots of highly localised, different things...which I don’t , particularly.
The problem of FAI is the problem of figuring out all of humanity’s deepest concerns and preferences, not just the problem of figuring out the ‘moral’ ones (whichever those are). E.g., we want a superintelligence to not make life boring for everyone forever, even if ‘don’t bore people’ isn’t a moral imperative.
Regardless, I don’t see how the moral subset of human concerns could be simplified without sacrificing most human intuitions about what’s right and wrong. Human intuitions as they stand aren’t even consistent, so I don’t understand how you can think the problem of making them consistent and actionable is going to be a simple one.
if it is not, then you can figure it out anywhere,
Someday, perhaps. With enough time and effort invested. Still, again, we would expect a lot more human-intelligence-level aliens (even if those aliens knew a lot about human behavior) to be good at building better AIs than to be good at formalizing human value. For the same reason, we should expect a lot more possible AIs we could build to be good at building better AIs than to be good at formalizing human value.
If it is,then the problem the aliens have is not that morality is imponderable
I don’t know what you mean by ‘imponderable’. Morality isn’t ineffable; it’s just way too complicated for us to figure out. We know how things are on Earth; we’ve been gathering data and theorizing about morality for centuries. And our progress in formalizing morality has been minimal.
An averagely intelligent AI with an average grasp of morlaity would not be more of a threat than an average human.
An AI that’s just a copy of a human running on transistors is much more powerful than a human, because it can think and act much faster.
A smart AI, would, all other things being equal, be better at figuring out moralitry.
It would also be better at figuring out how many atoms are in my fingernail, but that doesn’t mean it will ever get an exact count. The question is how rough an approximation of human value can we allow before all value is lost; this is the ‘fragility of values’ problem. It’s not enough for an AGI to do better than us at FAI; it has to be smart enough to solve the problem to a high level of confidence and precision.
But why should moral concepts be som much more difficult than others?
First, because they’re anthropocentric; ‘iron’ can be defined simply because it’s a common pattern in Nature, not a rare high-level product of a highly contingent and complex evolutionary history. Second, because they’re very inclusive; ‘what humans care about’ or ‘what humans think is Right’ is inclusive of many different human emotions, intuitions, cultural conventions, and historical accidents.
But the main point is just that human value is difficult, not that it’s the most difficult thing we could do. If other tasks are also difficult, that doesn’t necessarily make FAI easier.
An AI smart enought to talk its way out of a box would be able to understand the implicit complexity: an AI too dumb to understand implicit complexity would be boxable. Where is the problem?
You’re forgetting the ‘seed is not the superintelligence’ lesson from The genie knows, but doesn’t care. If you haven’t read that article, go do so. The seed AI is dumb enough to be boxable, but also too dumb to plausibly solve the entire FAI problem itself. The superintelligent AI is smart enough to solve FAI, but also too smart to be safely boxed; and it doesn’t help us that an unFriendly superintelligent AI has solved FAI, if by that point it’s too powerful for us to control. You can’t safely pass the buck to a superintelligence to tell us how to build a superintelligence safe enough to pass bucks to.
Things are not inherently dagerous just because they are unpredictable. If you have some independent reason fo thinking something might turn dangerous, then it becomes desirable to predict it.
Yes. The five theses give us reason to expect superintelligent AI to be dangerous by default. Adding more unpredictability to a system that already seems dangerous will generally make it more dangerous.
they are not assumed to develop mysterious blind spots about falconry or mining engineering, Why assume they will develop a blind spot about morality?
‘The genie knows, but doesn’t care’ means that the genie (i.e., superintelligence) knows how to do human morality (or could easily figure it out, if it felt like trying), but hasn’t been built to care about human morality. Knowing how to behave the way humans want you to is not sufficient for actually behaving that way; Eliezer makes that point well in No Universally Compelling Arguments.
The worry isn’t that the superintelligence will be dumb about morality; it’s that it will be indifferent to morality, and that by the time it exists it will be too late to safely change that indifference. The seed AI (which is not a superintelligence, but is smart enough to set off a chain of self-modifications that lead to a superintelligence) is dumb about morality (approximately as dumb as humans are, if not dumber), and is also probably not a particularly amazing falconer or miner. It only needs to be a competent programmer, to qualify as a seed AI.
The average person manages to solve the problem of being moral themselves, in a good-enough way.
Good enough for going to the grocery store without knifing anyone. Probably not good enough for safely ruling the world. With greater power comes a greater need for moral insight, and a greater risk should that insight be absent.
Why isn’t havign a formalisation of morality a prolem with humans?
It is a problem, and it leads to a huge amount of human suffering. It doesn’t mean we get everything wrong, but we do make moral errors on a routine basis; the consequences are mostly non-catastrophic because we’re slow, weak, and have adopted some ‘good-enough’ heuristics for bounded circumstances.
We know how humans incremently improve as moral reasoners: it’s called the Kohlberg hierarchy.
Just about every contemporary moral psychologist I’ve read or talked to seems to think that Kohlberg’s overall model is false. (Though some may think it’s a useful toy model, and it certainly was hugely influential in its day.) Haidt’s The Emotional Dog and Its Rational Tail gets cited a lot in this context.
We do have morality tests. Fail them and you get pilloried in the media or sent to jail.
That’s certainly not good enough. Build a superintelligence that optimizes for ‘following the letter of the law’ and you don’t get a superintelligence that cares about humans’ deepest values. The law itself has enough inexactness and arbitrariness that it causes massive needless human suffering on a routine basis, though it’s another one of those ‘good-enough’ measures we keep in place to stave off even worse descents into darkness.
If it works like arithmetic, that is if it is an expansion of some basic principles
Human values are an evolutionary hack resulting from adaptations to billions of different selective pressures over billions of years, innumerable side-effects of those adaptations, genetic drift, etc. Arithmetic can be formalized in a few sentences. Why think that humanity’s deepest preferences are anything like that simple? Our priors should be very low for ‘human value is simple’ just given the etiology of human value, and our failure to converge on any simple predictive or normative theory thus far seems to only confirm this.
The problem of FAI is the problem of figuring out all of humanity’s deepest concerns and preferences, not just the problem of figuring out the ‘moral’ ones (whichever those are)
Those two things turn out to be identical (deepest concerns and preferences=the ‘moral’ ones). Because nothing else can be of greater importance to a decision maker.
An AI smart enought to talk its way out of a box would be able to understand the implicit complexity: an AI too dumb to understand implicit complexity would be boxable. Where is the problem?
You’re forgetting the ‘seed is not the superintelligence’ lesson from The genie knows, but doesn’t care. If you haven’t read that article, go do so. The seed AI is dumb enough to be boxable, but also too dumb to plausibly solve the entire FAI problem itself.
I am arguing that it would not have to solve AI itself.
The superintelligent AI is smart enough to solve FAI, but also too smart to be safely boxed;
Huh? If it is moral and alien friendly , why would you need to box it?
and it doesn’t help us that an unFriendly superintelligent AI has solved FAI, if by that point it’s too powerful for us to control. You can’t safely pass the buck to a superintelligence to tell us how to build a superintelligence safe enough to pass bucks to.
If it’s friendly, why enslave it?
Things are not inherently dagerous just because they are unpredictable. If you have some independent reason fo thinking something might turn dangerous, then it becomes desirable to predict it.
Yes. The five theses give us reason to expect superintelligent AI to be dangerous by default. Adding more unpredictability to a system that already seems dangerous will generally make it more dangerous.
The five theses are variously irrelevant and misapplied. Details supplied on request.
they are not assumed to develop mysterious blind spots about falconry or mining engineering, Why assume they will develop a blind spot about morality?
‘>The genie knows, but doesn’t care’ means that the genie (i.e., superintelligence) knows how to do human morality (or could easily figure it out, if it felt like trying), but hasn’t been built to care about human morality.
What genie? Who built it that way? If your policy is to build an artificial philosopher, an AI that can solve morality is itself, why would you build it to not act on what it knows?
Knowing how to behave the way humans want you to is not sufficient for actually behaving that way; Eliezer makes that point well in No Universally Compelling Arguments.
No, his argument is irrelevant as explained in this comment.
The worry isn’t that the superintelligence will be dumb about morality; it’s that it will be indifferent to morality,
You don’t have to pre-programme the whole of friendliness or morality to fix that. If you have reason to suspect that there are no intrinsically compelling concepts, then you can build an AI that wants to be moral, but needs to figure otu what that is.
and that by the time it exists it will be too late to safely change that indifference. The seed AI (which is not a superintelligence, but is smart enough to set off a chain of self-modifications that lead to a superintelligence) is dumb about morality (approximately as dumb as humans are, if not dumber), and is also probably not a particularly amazing falconer or miner. It only needs to be a competent programmer, to qualify as a seed AI.
Which is only a problem if you assume, as I don’t, that it will be pre-programming a fixed morality.
The average person manages to solve the problem of being moral themselves, in a good-enough way.
Good enough for going to the grocery store without knifing anyone. Probably not good enough for safely ruling the world. With greater power comes a greater need for moral insight, and a greater risk should that insight be absent.
With greater intelligence comes greater moral insight—unless you create a problem by walling off that part of an AI.
Why isn’t havign a formalisation of morality a prolem with humans?
It is a problem, and it leads to a huge amount of human suffering. It doesn’t mean we get everything wrong, but we do make moral errors on a routine basis; the consequences are mostly non-catastrophic because we’re slow, weak, and have adopted some ‘good-enough’ heuristics for bounded circumstances.
OK. The consequences are non catastrophic. An AI with imperfect, good-enough morality would not be an existential threat.
We know how humans incremently improve as moral reasoners: it’s called the Kohlberg hierarchy.
Just about every contemporary moral psychologist I’ve read or talked to seems to think that Kohlberg’s overall model is false. (Though some may think it’s a useful toy model, and it certainly was hugely influential in its day.) Haidt’s The Emotional Dog and Its Rational Tail gets cited a lot in this context.
And does Haidt’s work mean that everyone is one par, morally? Does it mean that no one can progress in moral insight?
We do have morality tests. Fail them and you get pilloried in the media or sent to jail.
That’s certainly not good enough. Build a superintelligence that optimizes for ‘following the letter of the law’ and you don’t get a superintelligence that cares about humans’ deepest values.
It isn’t good enough for a ceiling: it is good enough for a floor.
If it works like arithmetic, that is if it is an expansion of some basic principles
Human values are an evolutionary hack resulting from adaptations to billions of different selective pressures over billions of years, innumerable side-effects of those adaptations, genetic drift, etc.
De facto ones are, yes. Likewise folk physics is an evolutionary hack. But if we build an AI to do physics, we don’t intend
it to do folk physics, we intend it to do physics.
Arithmetic can be formalized in a few sentences. Why think that humanity’s deepest preferences are anything like that simple?
There’s a theory of morality that can be expressed in a few sentences, and leaves preferences as variables to be filled in later. It’s called utilitarianism.
Our priors should be very low for ‘human value is simple’ just given the etiology of human value, and our failure to converge on any simple predictive or normative theory thus far seems to only confirm this.
So? If value is complex, that doesn’t affect utilitarianism, for instance. You,and other lesswrongian writers, keep behaving as though “values are X” is obviously equivalent to “morality is X”.
“The superintelligent AI is smart enough to solve FAI, but also too smart to be safely boxed;”
Huh? If it is moral and alien friendly , why would you need to box it?
You’re confusing ‘smart enough to solve FAI’ with ‘actually solved FAI’, and you’re confusing ‘actually solved FAI’ with ‘self-modified to become Friendly’. Most possible artificial superintelligences have no desire to invest much time into figuring out human value, and most possible ones that do figure out human value have no desire to replace their own desires with the desires of humans. If the genie knows how to build a Friendly AI, that doesn’t imply that the genie is Friendly; so superintelligence doesn’t in any way imply Friendliness even if it implies the ability to become Friendly.
No, his argument is irrelevant as explained in this comment.
Why does that comment make his point irrelevant? Are you claiming that it’s easy to program superintelligences to be ‘rational″, where ‘rationality’ doesn’t mean instrumental or epistemic rationality but instead means something that involves being a moral paragon? It just looks to me like black-boxing human morality to make it look simpler or more universal.
If you have reason to suspect that there are no intrinsically compelling concepts, then you can build an AI that wants to be moral, but needs to figure otu what that is.
And how do you code that? If the programmers don’t know what ‘be moral’ means, then how do they code the AI to want to ‘be moral’? See Truly Part Of You.
An AI with imperfect, good-enough morality would not be an existential threat.
A human with superintelligence-level superpowers would be an existential threat. An artificial intelligence with superintelligence-level superpowers would therefore also be an existential threat, if it were merely as ethical as a human. If your bar is set low enough to cause an extinction event, you should probably raise your bar a bit.
And does Haidt’s work mean that everyone is one par, morally? Does it mean that no one can progress in moral insight?
No. Read Haidt’s paper, and beware of goalpost drift.
It isn’t good enough for a ceiling: it is good enough for a floor.
No. Human law isn’t built for superintelligences, so it doesn’t put special effort into blocking loopholes that would be available to an ASI. E.g., there’s no law against disassembling the Sun, because no lawmaker anticipated that anyone would have that capability.
There’s a theory of morality that can be expressed in a few sentences, and leaves preferences as variables to be filled in later. It’s called utilitarianism.
… Which isn’t computable, and provides no particular method for figuring out what the variables are. ‘Preferences’ isn’t operationalized.
You,and other lesswrongian writers, keep behaving as though “values are X” is obviously equivalent to “morality is X”.
Values in general are what matters for Friendly AI, not moral values. Moral values are a proper subset of what’s important and worth protecting in humanity.
The problem of FAI is the problem of figuring out all of humanity’s deepest concerns and preferences, not just the problem of figuring out the ‘moral’ ones (whichever those are). E.g., we want a superintelligence to not make life boring for everyone forever, even if ‘don’t bore people’ isn’t a moral imperative.
The AI might need a lot of localised information for friendliness, but it needn’t be preprogrammed.
You have assumed that friendliness is a superset of morality. Assume also that an AI is capable of being moral.
Then. to have a more fun existence, all you have to do is ask it questions, like “How can we build hovering skateboards”.
What failure modes could that lead to?
If it doesn’t know what things humans enjoy, it can research the subject..humans can entertain their pets after all.
It would have no reason to refuse to answer questions, unless the answer was dangerous to human well being (ie what humans assume is harmless fun actually isn’t). But that isn’t actually a failure. it’s a safety feature.
Regardless, I don’t see how the moral subset of human concerns could be simplified without sacrificing most human intuitions about what’s right and wrong.
That’s like saying you can’t simply folk physics down to real physics without sacrificing a lot of intuitions. Intuitions that are wrong need to go.
Human intuitions as they stand aren’t even consistent, so I don’t understand how you can think the problem of making them consistent and actionable is going to be a simple one.
I didn’t say it was simple. I want the SAI to do it for itself. I don’t think the alternative, of solving friendliness—which is more than morality—and preprogramming it is simple.
Someday, perhaps. With enough time and effort invested. Still, again, we would expect a lot more human-intelligence-level aliens (even if those aliens knew a lot about human behavior) to be good at building better AIs than to be good at formalizing human value. For the same reason, we should expect a lot more possible AIs we could build to be good at building better AIs than to be good at formalizing human value.
So what’s the critical difference between understanding value and understanding (eg) language? I think the asymmetry has come in where you assume that “value” has to be understood as a rag bag of attitudes and opinion. Everyone assumes that understanding physics means understanding the kind of physics found in textbooks, and that understanding language need only go as far as understanding a cleaned-up official version, and not a superposition of every possible dialect and idiolect. Morality/value looks difficult to you because you are taking it to be the incoherent mess you would get by throwing in everyone’s attitudes and beliefs into a pot indiscriminately. But many problems would be insoluble under that assumption.
If it is,then the problem the aliens have is not that morality is imponderable
I don’t know what you mean by ‘imponderable’. Morality isn’t ineffable; it’s just way too complicated for us to figure out.
If you assume that all intuitions have to be taken into account, even conflicting ones, then it’s nto just difficult, it’s impossible. But I don;’t assume that.
We know how things are on Earth; we’ve been gathering data and theorizing about morality for centuries. And our progress in formalizing morality has been minimal.
Yet the average person is averagely moral. People, presumably, are not running on formalisation. If you assume that an AI has to be preprogrammed with morality, then you can conclude that an AI will need the formalisation we don’t have. If you assume that an AI is a learning system, then it can learn and does not need to be preprogrammed.
An averagely intelligent AI with an average grasp of morlaity would not be more of a threat than an average human.
An AI that’s just a copy of a human running on transistors is much more powerful than a human, because it can think and act much faster.
If you speed up a chicken brain by a million, what do you get?
A smart AI, would, all other things being equal, be better at figuring out moralitry.
It would also be better at figuring out how many atoms are in my fingernail, but that doesn’t mean it will ever get an exact count. The question is how rough an approximation of human value can we allow before all value is lost; this is the ‘fragility of values’ problem.
There is no good reason to think that “morality” and “values” are synonyms.
It’s not enough for an AGI to do better than us at FAI; it has to be smart enough to solve the problem to a high level of confidence and precision.
If it’s too dumb to solve it, it’s too dumb to be a menace; if it’s smart enough to be a menace, it’s smart enought to solve it.
But why should moral concepts be som much more difficult than others?
First, because they’re anthropocentric;
Are they? Animals can be morally relevant to humans. Human can be morally relevant to aliens. Aliens cam be morally relevant to each other .
‘iron’ can be defined simply because it’s a common pattern in Nature, not a rare high-level product of a highly contingent and complex evolutionary history.
Anthropic and Universal aren’t the only options. Alien morality is a coherent concept, like alien art and alien economics.
Second, because they’re very inclusive; ‘what humans care about’ or ‘what humans think is Right’ is inclusive of many different human emotions, intuitions, cultural conventions, and historical accidents.
Morality is about what is right, not what is believe to be. Physics is not folk physics or history of physics.
The AI might need a lot of localised information for friendliness, but it needn’t be preprogrammed.
I don’t know what you mean by ‘preprogrammed’, and I don’t know what view you think you’re criticizing by making this point. MIRI generally supports indirect normativity, not direct normativity.
You have assumed that friendliness is a superset of morality.
A Friendly AI is, minimally, a situation-generally safe AGI. By the intelligence explosion thesis, ‘situation-generally’ will need to encompass the situation in which an AGI self-modifies to an ASI (artificial superintelligence), and since ASI are much more useful and dangerous than human-level AGIs, the bulk of the work in safety-proofing AGI will probably go into safety-proofing ASI.
A less minimal definition will say that Friendly AIs are AGIs that bring about situations humans strongly desire/value, and don’t bring about situations they strongly dislike/disvalue. One could also treat this as an empirical claim about the more minimal definition: Any adequately safe AGI will be extremely domain-generally useful.
Regardless, nowhere in the above two paragraphs did I talk specifically about morality. Moral values are important, but they are indeed a proper subset of human values, and we don’t want an AGI to make everything worse forever even if it finds a way to do so without doing anything ‘immoral’.
Assume also that an AI is capable of being moral.
No one has assumed otherwise. The problem isn’t that Friendly AI is impossible; it’s that most ASIs aren’t Friendly, and unFriendly ASIs seem to be easier to build (because they’re a more generic class).
That’s like saying you can’t simply folk physics down to real physics without sacrificing a lot of intuitions. Intuitions that are wrong need to go.
Getting rid of wrong intuitions may well make morality more complicated, rather than less. We agree that human folk morality may need to be refined a lot, but that gives us no reason to expect the task to be easy or the end-product to be simple. Physical law appears to be simple, but it begets high-level regularities that are much less simple, like brains, genomes, and species. Morality occurs at a level closer to brains, genomes, and species than to physical law.
But many problems would be insoluble under that assumption.
If human civilization depended on building an AI that can domain-generally speak English in a way that we’d ideally recognize as Correct, then I would be extremely worried. We can get away with shortcuts and approximations because speaking English correctly isn’t very important. But getting small things permanently wrong about human values is important, when you’re in control of the future of humanity.
It might not look fair that humans have to deal with such a huge problem, but the universe doesn’t always give people reasonable-sized challenges.
If you assume that an AI is a learning system, then it can learn and does not need to be preprogrammed.
It has to be preprogrammed to learn the right things, and to incorporate the right things it’s learned into its preferences. Saying ‘Just program the AI to learn the right preferences’ doesn’t solve the problem; programming the AI to learn the right preferences is the problem. See Detached lever fallacy:
“All this goes to explain why you can’t create a kindly Artificial Intelligence by giving it nice parents and a kindly (yet occasionally strict) upbringing, the way it works with a human baby. As I’ve often heard proposed.
“It is a truism in evolutionary biology that conditional responses require more genetic complexity than unconditional responses. To develop a fur coat in response to cold weather requires more genetic complexity than developing a fur coat whether or not there is cold weather, because in the former case you also have to develop cold-weather sensors and wire them up to the fur coat.
“But this can lead to Lamarckian delusions: Look, I put the organism in a cold environment, and poof, it develops a fur coat! Genes? What genes? It’s the cold that does it, obviously.
“There were, in fact, various slap-fights of this sort, in the history of evolutionary biology—cases where someone talked about an organismal response accelerating or bypassing evolution, without realizing that the conditional response was a complex adaptation of higher order than the actual response. (Developing a fur coat in response to cold weather, is strictly more complex than the final response, developing the fur coat.) [...]
“But the upshot is that if you have a little baby AI that is raised with loving and kindly (but occasionally strict) parents, you’re pulling the levers that would, in a human, activate genetic machinery built in by millions of years of natural selection, and possibly produce a proper little human child. Though personality also plays a role, as billions of parents have found out in their due times.
“It’s easier to program in unconditional niceness, than a response of niceness conditional on the AI being raised by kindly but strict parents. If you don’t know how to do that, you certainly don’t know how to create an AI that will conditionally respond to an environment of loving parents by growing up into a kindly superintelligence. If you have something that just maximizes the number of paperclips in its future light cone, and you raise it with loving parents, it’s still going to come out as a paperclip maximizer. There is not that within it that would call forth the conditional response of a human child. Kindness is not sneezed into an AI by miraculous contagion from its programmers. Even if you wanted a conditional response, that conditionality is a fact you would have to deliberately choose about the design.
“Yes, there’s certain information you have to get from the environment—but it’s not sneezed in, it’s not imprinted, it’s not absorbed by magical contagion. Structuring that conditional response to the environment, so that the AI ends up in the desired state, is itself the major problem. ‘Learning’ far understates the difficulty of it—that sounds like the magic stuff is in the environment, and the difficulty is getting the magic stuff inside the AI. The real magic is in that structured, conditional response we trivialize as ‘learning’. That’s why building an AI isn’t as easy as taking a computer, giving it a little baby body and trying to raise it in a human family. You would think that an unprogrammed computer, being ignorant, would be ready to learn; but the blank slate is a chimera.”
The AI might need a lot of localised information for friendliness, but it needn’t be preprogrammed.
I don’t know what you mean by ‘preprogrammed’,
I mean the proposal to solve morality and code it in to an AI.
and I don’t know what view you think you’re criticizing by making this point. MIRI generally supports indirect normativity, not direct normativity.
You have assumed that friendliness is a superset of morality.
A Friendly AI is, minimally, a situation-generally safe AGI.
Whrich is to say that full Fat friendliness is a superset of minimal friendliness . But minimal friendliness is just what I have been calling morality, and I dont see why I shouldn’t continue. So friendliness is a superset of morality, as I said.
By the intelligence explosion thesis, ‘situation-generally’ will need to encompass the situation in which an AGI self-modifies to an ASI (artificial superintelligence), and since ASI are much more useful and dangerous than human-level AGIs, the bulk of the work in safety-proofing AGI will probably go into safety-proofing ASI.
.....by your assumptions, that morality/friendliness needs to say be solved separately from intelligence. But that is just what I am disputing.
A less minimal definition will say that Friendly AIs are AGIs that bring about situations humans strongly desire/value, and don’t bring about situations they strongly dislike/disvalue. One could also treat this as an empirical claim about the more minimal definition: Any adequately safe AGI will be extremely domain-generally useful.
An AGI can be useful without wanting to do anything but answer questions accurately.
Regardless, nowhere in the above two paragraphs did I talk specifically about morality.
You didn’t do use the word. But I think “not doing bad things, whilst not necessarily doing fun things either” picks out the same referent.
Moral values are important, but they are indeed a proper subset of human values, and we don’t want an AGI to make everything worse forever even if it finds a way to do so without doing anything ‘immoral’.
I find it hard to interpret that statement. How can making things worse forever not be immoral? What non-moral definition of worse are you mean using ?
That’s like saying you can’t simply folk physics down to real physics without sacrificing a lot of intuitions. Intuitions that are wrong need to go.
Getting rid of wrong intuitions may well make morality more complicated, rather than less. We agree that human folk morality may need to be refined a lot, but that gives us no reason to expect the task to be easy or the end-product to be simple
We have very good reason to think that the one true theory of something will be simpler, in Kolmogorov terms, than a mishmash of everybody’s guesses Physics is simpler than folk physics. (It is harder to learn, because that requires the effortful system II to engage...but effort and complexity are different things).
And remember , my assumption is that the AI works out morality itself.
. Physical law appears to be simple, but it begets high-level regularities that are much less simple, like brains, genomes, and species. Morality occurs at a level closer to brains, genomes, and species than to physical law.
If an ASI can figure out such high level subjects as biology and decision theory, why shouldn’t it be useful able to figure out morality?
But many problems would be insoluble under that assumption.
If human civilization depended on building an AI that can domain-generally speak English in a way that we’d ideally recognize as Correct, then I would be extremely worried. We can get away with shortcuts and approximations because speaking English correctly isn’t very important. But getting small things permanently wrong about human values is important, when you’re in control of the future of humanity.
Why wouldn’t an AI that is smarter than US not be able to realise that for itself ?
It might not look fair that humans have to deal with such a huge problem, but the universe doesn’t always give people reasonable-sized challenges.
If you assume that an AI is a learning system, then it can learn and does not need to be preprogrammed.
It has to be preprogrammed to learn the right things, and to incorporate the right things it’s learned into its preferences.
That is confusingly phrased. A learning system needs some basis to learn, granted. You assume, tacitly, that it need not be preprogrammed with the right rules of grammar or economics. Why make exception for ethics?
Saying ‘Just program the AI to learn the right preferences’ doesn’t solve the problem; programming the AI to learn the right preferences is the problem. See Detached lever fallacy:
A learning system needs some basis other than external stimulus to learn: given that, it is quite possible for most of the information to be contained in the stimulus, the data. Consider language. Do you think an AI will have to be preprogrammed with all the contents of every dictionary
“All this goes to explain why you can’t create a kindly Artificial Intelligence by giving it nice parents and a kindly (yet occasionally strict) upbringing, the way it works with a human baby. As I’ve often heard proposed.
“It is a truism in evolutionary biology that conditional responses require more genetic complexity than unconditional responses. To develop a fur coat in response to cold weather requires more genetic complexity than developing a fur coat whether or not there is cold weather, because in the former case you also have to develop cold-weather sensors and wire them up to the fur coat.
“But this can lead to Lamarckian delusions: Look, I put the organism in a cold environment, and poof, it develops a fur coat! Genes? What genes? It’s the cold that does it, obviously.
“There were, in fact, various slap-fights of this sort, in the history of evolutionary biology—cases where someone talked about an organismal response accelerating or bypassing evolution, without realizing that the conditional response was a complex adaptation of higher order than the actual response. (Developing a fur coat in response to cold weather, is strictly more complex than the final response, developing the fur coat.) [...]
“But the upshot is that if you have a little baby AI that is raised with loving and kindly (but occasionally strict) parents, you’re pulling the levers that would, in a human, activate genetic machinery built in by millions of years of natural selection, and possibly produce a proper little human child. Though personality also plays a role, as billions of parents have found out in their due times.
“It’s easier to program in unconditional niceness, than a response of niceness conditional on the AI being raised by kindly but strict parents. If you don’t know how to do that, you certainly don’t know how to create an AI that will conditionally respond to an environment of loving parents by growing up into a kindly superintelligence. If you have something that just maximizes the number of paperclips in its future light cone, and you raise it with loving parents, it’s still going to come out as a paperclip maximizer. There is not that within it that would call forth the conditional response of a human child. Kindness is not sneezed into an AI by miraculous contagion from its programmers. Even if you wanted a conditional response, that conditionality is a fact you would have to deliberately choose about the design.
“Yes, there’s certain information you have to get from the environment—but it’s not sneezed in, it’s not imprinted, it’s not absorbed by magical contagion. Structuring that conditional response to the environment, so that the AI ends up in the desired state, is itself the major problem. ‘Learning’ far understates the difficulty of it—that sounds like the magic stuff is in the environment, and the difficulty is getting the magic stuff inside the AI. The real magic is in that structured, conditional response we trivialize as ‘learning’. That’s why building an AI isn’t as easy as taking a computer, giving it a little baby body and trying to raise it in a human family. You would think that an unprogrammed computer, being ignorant, would be ready to learn; but the blank slate is a chimera.”
I think that RobbBB has already done a great job of responding to this, but I’d like to have a try at it too. I’d like to explore the math/morality analogy a bit more. I think I can make a better comparison.
Math is an enormous field of study. Even if we limited our concept of “math” to drawing graphs of mathematical functions, we would still have an enormous range of different kinds of functions: Hyperbolic, exponential, polynomial, all the trigonometric functions, etc. etc.
Instead of comparing math to morality, I think it’s more illustrative to compare math to the wider topic of “value-driven-behaviour”.
An intelligent creature could have all sorts of different values. Even within the realm of modern, western, democratic morality we still disagree about whether it is just and propper to execute murderers. We disagree about the extent to which a state is obligated to protect its citizens and provide a safety net. We disagree about the importance of honesty, of freedom vs. safety, freedom of speech vs. protection from hate speech.
If you look at the wider world, and at cultures through history, you’ll find a much wider range of moralities. People who thought it was not just permitted, but morally required that they enslave people, restrict the freedoms of their own families, and execute people for religious transgressions.
You might think that these are all better or worse approximations of the “one true morality”, and that a superintelligence could work out what that true morality is. But we don’t think so. We believe that these are different moralities. Fundamentally, these people have different values.
Then we can step further out, and look at the “insane” value systems that a person could hold. Perhaps we could believe that all people are so flawed that they must be killed. Or we could believe that no one should ever be allowed to die, and so we extend life indefinitely, even for people in agony. Or we might believe everyone should be lobotomised for our own safety.
And then there are the truly inhuman value systems: the paperclip maximisers, the prime pebble sorters, and the baby eaters. The idea is that a superintelligence could comprehend any and all of these. It would be able to optimise for any one of them, and foresee results and possible consequences for all of them. The question is: which one would it actually use?
A superintelligence might be able to understand all of human math and more besides, but we wouldn’t build one to simply “do all of maths”. We would build it with a particular goal and purpose in mind. For instance (to pick an arbitrary example) we might need it to create graphs of Hyperbolic functions. It’s a bad example, I know. But I hope it serves to help make the point.
Likewise, we would want the intelligence to adopt a specific set of values. Perhaps we would want them to be modern, western, democratic liberal values.
I wouldn’t expect a superintelligence to start generating Hyperbolic functions, despite the fact that it’s smart enough to do so. The AI would have no reason to start doing that particular task. It might be smart enough to work out that that’s what we want of course, but that doesn’t mean it’ll do it (unless we’ve already solved the problem of getting them to do “what humans want it to do”.) If we want Hyperbolic functions, we’ll have to program the machine with enough focus to make it do that.
Likewise, a computer could have any arbitrary utility function, any arbitrary set of values. We can’t make sure that a computer has the “right” values unless we know how to clearly define the values we want.
With Hyperbolic functions, it’s relatively easy to describe exactly, unambiguously, what we want. But morality is much harder to pin down.
An intelligent creature could have all sorts of different values. Even within the realm of modern, western, democratic morality we still disagree about whether it is just and proper to execute murderers. We disagree about the extent to which a state is obligated to protect its citizens and provide a safety net. We disagree about the importance of honesty, of freedom vs. safety, freedom of speech vs. protection from hate speech.
The range of possible values is only a problem if you hold to the theory that morality “is” values, without any further qualifications, then an AI is going to have trouble figuring out morality apriori. If you take the view that morality is a fairly uniform way of handling values, or a subset of values, then so long as then the AI can figure it out by taking prevailing values as input, as data.
(We will be arguing that:-
Ethics fulfils a role in society, and originated as a mutually beneficial way of regulating individual actions to minimise conflict, and solve coordination problems. (“Social Realism”).
No spooky or supernatural entities or properties are required to explain ethics (naturalism is true)
There is no universally correct system of ethics. (Strong moral realism is false)
Multiple ethical constructions are possible...
Our version of ethical objectivism needs to be distinguished from universalism as well as realism,
Ethical universalism is unikely...it is unlikely that different societies would have identical ethics under different circumstances. Reproductive technology must affect sexual ethics. The availability of different food sources in the environment must affect vegetarianism versus meat eating. However, a compromise position can allow object-level ethics to vary non-arbitrarily.
In other words, there is not an objective answer to questions of the form “should I do X”, but there is an answer to the question “As a member of a society with such-and-such prevailing conditions, should I do X”. In other words still, there is no universal (object level) ethics, but there there is an objective-enough ethics, which is relativised to societies and situations, by objective features of societies and situations...our meta ethics is a function from situations to object level ethics, and since both the functions and its parameters are objective, the output is objective.
By objectivism-without-realism, we mean that mutually isolated groups of agents would be able to converge onto the same object level ethics under the same circumstances, although this convergence doesn’t imply the pre-existence of some sort of moral object, as in standard realism. We take ethics to be a social arrangement, or cultural artefact which fulfils a certain role or purpose, characterised by the reduction of conflict, allocation of resources and coordination of behaviour. By objectivism-without-universalism we mean that groups of agents under different circumstances will come up with different ethics. In either case, the functional role of ethics, in combination with the constraints imposed by concrete situations, conspire to narrow down the range of workable solutions, and (sufficiently) ideal reasoners will therefore be able to re-discover them.
)
If you look at the wider world, and at cultures through history, you’ll find a much wider range of moralities. People who thought it was not just permitted, but morally required that they enslave people, restrict the freedoms of their own families, and execute people for religious transgressions.
I don’t have to believe those are equally valid. Descriptive relativism does not imply normative relativism. I would expect a sufficiently advanced AI, with access to data pertaining to the situation, to come up with the optimum morality for the situation—an answer that is objective but not universal. Where morality needs to vary because situational factors (societal wealth, reproductive technology, level of threat/security, etc). it would, but otherwise the AI would not deviate form the situational optimum to come up with reproductions of whatever suboptimal morality existed in the past.
You might think that these are all better or worse approximations of the “one true morality”, and that a superintelligence could work out what that true morality is. But we don’t think so. We believe that these are different moralities. Fundamentally, these people have different values.
Well, we believe that different moralities and different values are two different axes.
Likewise, we would want the intelligence to adopt a specific set of values. Perhaps we would want them to be modern, western, democratic liberal values.
My hypothesis is that an AI in a modern society would come out with that or something better. (For instance, egalitarianism isn’t some arbitrary pecadillo, it is a very general and highly rediscoverable meta-level principle that
makes it easier for people to co-operate).
Likewise, a computer could have any arbitrary utility function, any arbitrary set of values. We can’t make sure that a computer has the “right” values unless we know how to clearly define the values we want.
To perform the calculation, it needs to be able to research out values, which it can. It doesn’t need to share them, as I have noted several times.
And then there are the truly inhuman value systems: the paperclip maximisers, the prime pebble sorters, and the baby eaters. The idea is that a superintelligence could comprehend any and all of these. It would be able to optimise for any one of them, and foresee results and possible consequences for all of them. The question is: which one would it actually use?
You could build an AI that adopts random value,s and pursues them relentlessly, I suppose, but that is pretty much a case of deliberately building an unfriendly AI.
What you need is a scenario where building an AI to want to understand, research, and eventually join in with huamn morality goes horribly wrong.
With Hyperbolic functions, it’s relatively easy to describe exactly, unambiguously, what we want. But morality is much harder to pin down.
In detail or in principle? Given what assumptions?
So… what you’re suggesting, in short, is that a sufficiently intelligent AI can work out the set of morals which are most optimal in a given human society. (There’s the question of whether it would converge on the most optimal set of morals for the long-term benefit of the society as a whole, or the most optimal set of morals for the long-term benefit of the individual).
But let’s say the AI works out an optimal set of morals for its current society. What’s to stop the AI from metaphorically shrugging and ignoring those morals in order to rather build more paperclips? Especially given that it does not share those values.
(There’s the question of whether it would converge on the most optimal set of morals for the long-term benefit of the society as a whole, or the most optimal set of morals for the long-term benefit of the individual).
Which individual? The might be some decision theory which promotes the interests of Joe Soap, against the interests of society, but there is no way i would call it morality.
But let’s say the AI works out an optimal set of morals for its current society. What’s to stop the AI from metaphorically shrugging and ignoring those morals in order to rather build more paperclips?
Its motivational system. We’re already assuming it’s motivated to make the deduction, we need to assume it’s motivated to implement. I am not bypassing the need for a goal driven AI to have appropriate goals, I am by passing the need for a detailed and accurate account of human ethics to be preprogrammed.
Especially given that it does not share those values.
I am not sayngn it necessarily does not. I am saying it does not necessarily.
Which individual? The might be some decision theory which promotes the interests of Joe Soap, against the interests of society, but there is no way i would call it morality.
Ah, I may have been unclear there.
To go into more detail, then; you appear to be suggesting that optimal morality can be approached as a society-wide optimisation problem; in the current situations, these moral strictures produce a more optimal society than those, and this optimisation problem can be solved with sufficient computational resources and information.
But now, let us consider an individual example. Let us say that I find a wallet full of money on the ground. There is no owner in sight. The optimal choice for the society as a whole is that I return the money to the original owner; the optimal choice for the individual making the decision is to keep the money and use it towards my aims, whatever those are. (I can be pretty sure that the man to whom I return the money will be putting it towards his aims, not mine, and if I’m sufficiently convinced that my aims are better for society than his then I can even rationalise this action).
By my current moral structures, I would have to return the money to its original owner. But I can easily see a superintelligent AI giving serious consideration to the possibility that it can do more good for the original owner with the money than the original owner could.
Its motivational system. We’re already assuming it’s motivated to make the deduction, we need to assume it’s motivated to implement.
This, right here, is the hard problem of Friendly AI. How do we make it motivated to implement? And, more importantly, how do we know that it is motivated to implement what we think it’s motivated to implement?
I am not bypassing the need for a goal driven AI to have appropriate goals, I am by passing the need for a detailed and accurate account of human ethics to be preprogrammed.
You’re suggesting that it can figure out the complicated day-to-day minutae and the difficult edge cases on its own, given a suitable algorithm for optimising morality.
My experience in software design suggests that that algorithm needs to be really, really good. And extremely thoroughly checked, from every possible angle, by a lot of people.
I’m not denying that such an algorithm potentially exists. I can just think of far, far too many ways for it to go very badly wrong.
I am not sayngn it necessarily does not. I am saying it does not necessarily.
...point taken. It may or may not share those values.
But then we must at least give serious consideration to the worst-case scenario.
No spooky or supernatural entities or properties are required to explain ethics (naturalism is true)
There is no universally correct system of ethics. (Strong moral realism is false)
I believe that iff naturalism is true then strong moral realism is as well. If naturalism is true then there are no additional facts needed to determine what is moral than the positions of particles and the outcomes of arranging those particles differently. Any meaningful question that can be asked of how to arrange those particles or rank certain arrangements compared to others must have an objective answer because under naturalism there are no other kinds and no incomplete information. For the question to remain unanswerable at that point would require supernatural intervention and divine command theory to be true. If you there can’t be an objective answer to morality, then FAI is literally impossible. Do remember that your thoughts and preference on ethics are themselves an arrangement of particles to be solved.
Instead I posit that the real morality is orders of magnitude more complicated, and finding it more difficult, than for real physics, real neurology, real social science, real economics, and can only be solved once these other fields are unified.
If we were uncertain about the morality of stabbing someone, we could hypothetically stab someone to see what happens. When the particles of the knife rearranges the particles of their heart into a form that harms them, we’ll know it isn’t moral. When a particular subset of people with extensive training use their knife to very carefully and precisely rearrange the particles of the heart to help people, we call those people doctors and pay them lots of money because they’re doing good. But without a shitload of facts about how to exactly stab someone in the heart to save their life, that moral option would be lost to you. And the real morality is a superset that includes that action along with all others.
If naturalism is true then there are no additional facts needed to determine what is moral than the positions of particles and the outcomes of arranging those particles differently. Any meaningful question that can be asked of how to arrange those particles or rank certain arrangements compared to others must have an objective answer because under naturalism there are no other kinds and no incomplete information.
Even if it were true that under naturalism we could determine the outcome of various arrangements of particles, wouldn’t we still be left with the question of which final outcome was the most morally preferable?
Do remember that your thoughts and preference on ethics are themselves an arrangement of particles to be solved.
But, you and I might have different moral preferences. How (under naturalism) do we objectively decide between your preferences and mine? And, Isn’t it also possible that neither your preferences nor my preferences are objectively moral?
Even if it were true that under naturalism we could determine the outcome of various arrangements of particles, wouldn’t we still be left with the question of which final outcome was the most morally preferable?
Yup.
But that’s sort-of contained within “the positions of particles” (so long as all their other properties are included, such as temperature and chemical connections and so on...might need to include rays of light and non-particle stuff too!). The two are just different ways of describing the same thing. Just like every object around you could be described either with their usual names, (“keyboard:, “desk”, etc) or with an elaborate molecule by molecule description. Plenty of other descriptions are possible too (like “rectangular black colored thing with a bunch of buttons with letters on it” describes my keyboard kinda).
How (under naturalism) do we objectively decide between your preferences and mine?
You don’t. True preferences (as opposed to mistaken preferences) aren’t something you get to decide. They are facts.
ut that’s sort-of contained within “the positions of particles” (so long as all their other properties are included, such as temperature and chemical connections and so on...might need to include rays of light and non-particle stuff too!). The two are just different ways of describing the same thing. Just like every object around you could be described either with their usual names, (“keyboard:, “desk”, etc) or with an elaborate molecule by molecule description. Plenty of other descriptions are possible too (like “rectangular black colored thing with a bunch of buttons with letters on it” describes my keyboard kinda).
That’s an expression of ethical naturalism not a defence of ethcial naturalism.
How (under naturalism) do we objectively decide between your preferences and mine?
You don’t. True preferences (as opposed to mistaken preferences) aren’t something you get to decide. They are facts.
Missing the point. Ethics needs to sort good actors from bad—decisions about punishments and rewards depend on it.
PS are you the same person as rkyeun? If not, to what extent are you on the same page?
Missing the point. Ethics needs to sort good actors from bad—decisions about punishments and rewards depend on it.
(I’d say need to sort good choices from bad. Which includes the choice to punish or reward.) Discovering which choices are good and which are bad is a fact finding mission. Because:
1) it’s a fact whether a certain choice will successfully fulfill a certain desire or not
And 2) that’s what “good” literally means: desirable.
So that’s what any question of goodness will be about: what will satisfy desires.
PS are you the same person as rkyeun? If not, to what extent are you on the same page?
No I’m not rkyeun. As for being on the same page...well I’m definitely a moral realist. I don’t know about their first iff-then statement though. Seems to me that strong moral realism could still exist if supernaturalism were true. Also, talking in terms of molecules is ridiculously impractical and unnecessary. I only talked in those terms because I was replying to a reply to those terms :P
I’d say need to sort good choices from bad. Which includes the choice to punish or reward.) Discovering which choices are good and which are bad is a fact finding mission. Because:
1) it’s a fact whether a certain choice will successfully fulfill a certain desire or not
And 2) that’s what “good” literally means: desirable.
So that’s what any question of goodness will be about: what will satisfy desires.
Whose desires? The murderer wants to murder the victim, the victim doesn’t want to be murdered. You have realism without objectivism. There is a realistic fact about people’s preferences, but since the same act can increase one person’s utility and reduce anothers, there is no unambiguous way to label an arbitrry outcome.
The murderer wants to murder the victim, the victim doesn’t want to be murdered.
Murder isn’t a foundational desire. It’s only a means to some other end. And usually isn’t even a good way to accomplish its ultimate end! It’s risky, for one thing. So usually it’s a false desire: if they knew the consequences of this murder compared to all other choices available, and they were correctly thinking about how to most certainly get what they really ultimately want, they’d almost always see a better choice.
(But even if it were foundational, not a means to some other end, you could imagine some simulation of murder satisfying both the “murderer”’s need to do such a thing and everyone else’s need for safety. Even the “murderer” would have a better chance of satisfaction, because they would be far less likely to be killed or imprisoned prior to satisfaction.)
since the same act can increase one person’s utility and reduce anothers, there is no unambiguous way to label an arbitrry outcome.
Well first, in the most trivial way, you can unambiguously label an outcome as “good for X”. If it really is (it might not be, after all, the consequences of achieving or attempting murder might be more terrible for the would-be murderer than choosing not to attempt murder).
It works the same with (some? all?) other adjectives too. For example: soluble. Is sugar objectively soluble? Depends what you try to dissolve it in, and under what circumstances. It is objectively soluble in pure water at room temperature. It won’t dissolve in gasoline.
Second, in game theory you’ll find sometimes there are options that are best for everyone. But even when there isn’t, you can still determine which choices for the individuals maximize their chance of satisfaction and such. Objectively speaking, those will be the best choices they can make (again, that’s what it means for something to be a good choice). And morality is about making the best choices.
It can be instrumental or terminal, as can most other criminal impulses.
But even if it were foundational, not a means to some other end, you could imagine some simulation of murder satisfying both the “murderer”’s need to do such a thing and everyone else’s need for safety. Even the “murderer” would have a better chance of satisfaction, because they would be far less likely to be killed or imprisoned prior to satisfaction
You can’t solve all ethical problems by keeping everyone in permanent simulation.
Well first, in the most trivial way, you can unambiguously label an outcome as “good for X”. If it really is
That’s no good. You can’t arrive at workable ethics by putting different weightings on the same actions from different perspectives. X stealing money form Y is good for X and bad for Y, so why disregard Y’s view? An act is either permitted or forbidden, punished or praised. You can’t say it is permissible-for-X but forbidden-for-Y if it involves both of them.
It works the same with (some? all?) other adjectives too.
No, there’s no uniform treatment of all predicates. Some are one-place, some are two-place. For instance, aesthetic choices can usually be fulfilled on a person-by-person basis.
Second, in game theory you’ll find sometimes there are options that are best for everyone.
To be precise, you sometimes find solutions that leave everyone better off, and more often find solutions that leave the average person better off.
Objectively speaking, those will be the best choices they can make (again, that’s what it means for something to be a good choice). And morality is about making the best choices.
Too vague. For someone who likes killing ot kill a lot of people is the best choice for them, but not the best ethical choice.
Discovering which choices are good and which are bad is a fact finding mission… So that’s what any question of goodness will be about: what will satisfy desires.
But, what if two different people have two conflicting desires? How do we objectively find the ethical resolution to the conflict?
But, what if two different people have two conflicting desires? How do we objectively find the ethical resolution to the conflict?
Basically: game theory.
In reality, I’m not sure there ever are precise conflicts of true foundational desires. Maybe it would help if you had some real example or something. But the best choice for each party will always be the one that maximizes their chances of satisfying their true desire.
I was surprised to hear that you doubt that there are ever conflicts in desires. But, since you asked, here is an example:
A is a sadist. A enjoys inflicting pain in others. A really wants to hurt B. B wishes not to be hurt by A. (For the sake of argument, lets suppose that no simulation technology is available that would allow A to hurt a virtual B, and that A can be reasonably confident that A will not be arrested and brought to trial for hurting B.)
In this scenario, since A and B have conflicting desires, how does a system that defines objective goodness as that which will satisfy desires resolve the conflict?
I would be very surprised to find that a universe whose particles are arranged to maximize objective good would also contain unpaired sadists and masochists. You seem to be asking a question of the form, “But if we take all the evil out of the universe, what about evil?” And the answer is “Good riddance.” Pun intentional.
I would be very surprised to find that a universe whose particles are arranged to maximize objective good would also contain unpaired sadists and masochists.
The problem is that neither you nor BrianPansky has proposed a viable objective standard for goodness. BrianPansky said that good is that which satisfies desires, but proposed no objective method for mediating conflicting desires. And hereyou said “Do remember that your thoughts and preference on ethics are themselves an arrangement of particles to be solved” but proposed no way for resolving conflicts between different people’s ethical preferences. Even if satisfying desires were an otherwise reasonable standard for goodness, it is not an objective standard, since different people may have different desires. Similarly, different people may have different ethical preferences, so an individual’s ethical preference would not be an objective standard either, even if it were otherwise a reasonable standard.
You seem to be asking a question of the form, “But if we take all the evil out of the universe, what about evil?”
No, I am not asking that. I am pointing out that neither your standard nor BrianPansky’s standard is objective. Therefore neither can be used to determine what would constitute an objectively maximally good universe nor could either be used to take all evil out of the universe, nor even to objectively identify evil.
I was surprised to hear that you doubt that there are ever conflicts in desires.
Re-read what I said. That’s not what I said.
First get straight: good literally objectively does mean desirable. You can’t avoid that. Your question about conflict can’t change that (thus it’s a red herring).
As for your question: I already generally answered it in my previous post. Use Game theory. Find the actions that will actually be best for each agent. The best choice for each party will always be the one that maximizes their chances of satisfying their true desires.
I might finish a longer response to your specific example, but that takes time. For now, Richard Carrier’s Goal Theory Update probably covers a lot of that ground.
First get straight: good literally objectively does mean desirable.
It does not.
Wiktionary states that it means “Acting in the interest of good; ethical.” (There are a few other definitions, but I’m pretty sure this is the right one here). Looking through the definitions of ‘ethical’, I find “Morally approvable, when referring to an action that affects others; good. ” ‘Morally’ is defined as “In keeping of requirements of morality.”, and ‘morality’ is “Recognition of the distinction between good and evil or between right and wrong; respect for and obedience to the rules of right conduct; the mental disposition or characteristic of behaving in a manner intended to produce morally good results. ”
Nowhere in there do I see anything about “desirable”—it seems to simplify down to “following a moral code”. I therefore suspect that you’re implicitly assuming a moral code which equates “desirable” with “good”—I don’t think that this is the best choice of a moral code, but it is a moral code that I’ve seen arguments in favour of before.
But, importantly, it’s not the only moral code. Someone who follows a different moral code can easily find something that is good but not desirable; or desirable but not good.
First get straight: good literally objectively does mean desirable.
It’s not at all clear that morally good means desirable. The idea that the good is the desirable gets what force it has from the fact that “good” has a lot of nonmoral meanings. Good ice cream is desirable ice cream, but what’s that got to do with ethics?
Morally good means what it is good to do. So there is something added to “good” to get morally good—namely it is what it is good all things considered, and good to do, as opposed to good in other ways that have nothing to do with doing.
It if it would be good to eat ice cream at the moment, eating ice cream is morally good. And if it would be bad to eat ice cream at the moment, eating ice cream is morally bad.
But when you say “good ice cream,” you aren’t talking about what it is good to do, so you aren’t talking about morality. Sometimes it is good to eat bad ice cream (e.g. you have been offered it in a situation where it would be rude to refuse), and then it is morally good to eat the bad ice cream, and sometimes it is bad to eat good ice cream (e.g. you have already eaten too much), and then it is morally bad to eat the good ice cream.
Morally good means what it is good to do. So there is something added to “good” to get morally good—namely it is what it is good all things considered, and good to do, as opposed to good in other ways that have nothing to do with doing.
That’s a theory of what “morally” is adding to “good”. You need to defend it against alternatives, rather than stating it as if it were obvious.
It if it would be good to eat ice cream at the moment, eating ice cream is morally good.
Are you sure? How many people agree with that? Do you have independent evidence , or are you just following through the consequences of your assumptions (ie arguing in circles)?
I think most people would say that it doesn’t matter if you eat ice cream or not, and in that sense they might say it is morally indifferent. However, while I agree that it mainly doesn’t matter, I think they are either identifying “non-morally obligatory” with indifferent here, or else taking something that doesn’t matter much, and speaking as though it doesn’t matter at all.
But I think that most people would agree that gluttony is a vice, and that implies that there is an opposite virtue, which would mean eating the right amount and at the right time and so on. And eating ice cream when it is good to eat ice cream would be an act of that virtue.
Would you agree that discussion about “morally good” is discussion about what we ought to do? It seems to me this is obviously what we are talking about. And we should do things that are good to do, and avoid doing things that are bad to do. So if “morally good” is about what we should do, then “morally good” means something it is good to do.
I think most people would say that it doesn’t matter if you eat ice cream or not, and in that sense they might say it is morally indifferent. However, while I agree that it mainly doesn’t matter, I think they are either identifying “non-morally obligatory” with indifferent here, or else taking something that doesn’t matter much, and speaking as though it doesn’t matter at all.
What is wrong with saying it doesn’t matter at all?
But I think that most people would agree that gluttony is a vice, and that implies that there is an opposite virtue, which would mean eating the right amount and at the right time and so on. And eating ice cream when it is good to eat ice cream would be an act of that virtue
That’s pretty much changing the subject.
Would you agree that discussion about “morally good” is discussion about what we ought to do?
And we should do things that are good to do, and avoid doing things that are bad to do
I think it is about what we morally ought to do. If you are playing chess, you ought to move the bishop diagonally,
but that is again non-moral.
We morally-should do what is morally good, and hedonistically-should do what is hedonotsitcally-good, and so on. These can conflict, so they are not the same.
Talking about gluttony and temperance was not changing the subject. Most people think that morally good behavior is virtuous behavior, and morally bad behavior vicious behavior. So that implies that gluttony is morally bad, and temperance morally good. And if eating too much ice cream can be gluttony, then eating the right amount can be temperance, and so morally good.
There is a lot wrong with saying “it doesn’t matter at all”, but basically you would not bother with eating ice cream unless you had some reason for it, and any reason would contribute to making it a good thing to do.
I disagree completely with your statements about should, which do not correspond with any normal usage. No one talks about “hedonistically should.”
To reduce this to its fundamentals:
“I should do something” means the same thing as “I ought to do something”, which means the same thing as “I need to do something, in order to accomplish something else.”
Now if we can put whatever we want for “something else” at the end there, then you can have your “hedonistically should” or “chess playing should” or whatever.
But when we are talking about morality, that “something else” is “doing what is good to do.” So “what should I do?” has the answer “whatever you need to do, in order to be doing something good to do, rather than something bad to do.”
Talking about gluttony and temperance was not changing the subject. Most people think that morally good behavior is virtuous behavior, and morally bad behavior vicious behavior. So that implies that gluttony is morally bad, and temperance morally good. And if eating too much ice cream can be gluttony, then eating the right amount can be temperance, and so morally good.
It’s changing the subject because you are switching from an isolated act to a pattern of behaviour.
There is a lot wrong with saying “it doesn’t matter at all”,
Such as?
but basically you would not bother with eating ice cream unless you had some reason for it, and any reason would contribute to making it a good thing to do.
You are using good to mean morally good again.
I disagree completely with your statements about should, which do not correspond with any normal usage. No one talks about “hedonistically should.”
You can’t infer the non-existence of a distinction from the fact that it is not regularly marked in ordinary language.
“Jade is an ornamental rock. The term jade is applied to two different metamorphic rocks that are composed of different silicate minerals:
Nephrite consists of a microcrystalline interlocking fibrous matrix of the calcium, magnesium-iron rich amphibole mineral series tremolite (calcium-magnesium)-ferroactinolite (calcium-magnesium-iron). The middle member of this series with an intermediate composition is called actinolite (the silky fibrous mineral form is one form of asbestos). The higher the iron content, the greener the colour.
Jadeite is a sodium- and aluminium-rich pyroxene. The precious form of jadeite jade is a microcrystalline interlocking growth of jadeite crystals.""
“I should do something” means the same thing as “I ought to do something”, which means the same thing as “I need to do something, in order to accomplish something else.”
So you say. Actually, the idea that ethical claims can be cashed out as hypotheticals is quite contentious.
Now if we can put whatever we want for “something else” at the end there, then you can have your “hedonistically should” or “chess playing should” or whatever.
But when we are talking about morality, that “something else” is “doing what is good to do.” So “what should I do?” has the answer “whatever you need to do, in order to be doing something good to do, rather than something bad to do.”
Back to the usual problem. What you morally-should do is whatever you need to do, in order to be doing something morally good, is true but vacuous. . What you morally-should do is whatever you need to do, in order to be doing something good is debatable.
The point about the words is that it is easy to see from their origins that they are about hypothetical necessity. You NEED to do something. You MUST do it. You OUGHT to do it, that is you OWE it and you MUST pay your debt. All of that says that something has to happen, that is, that it is somehow necessary.
Now suppose you tell a murderer, “It is necessary for you to stop killing people.” He can simply say, “Necessary, is it?” and then kill you. Obviously it is not necessary, since he can do otherwise. So what did you mean by calling it necessary? You meant it was necessary for some hypothesis.
I agree that some people disagree with this. They are not listening to themselves talk.
The reason that moral good means doing something good, is that the hypothesis that we always care about, is whether it would be good to do something. That gives you a reason to say “it is necessary” without saying for what, because everyone wants to do something that would be good to do.
Suppose you define moral goodness to be something else. Then it might turn out that it would be morally bad to do something that would be good to do, and morally good to do something that would be bad to do. But in that case, who would say that we ought to do the thing which is morally good, instead of the thing that would be good to do? They would say we should do the thing that would be good to do, again precisely because it is necessary, and therefore we MUST do the supposedly morally bad thing, in order to be doing something good to do.
Now suppose you tell a murderer, “It is necessary for you to stop killing people.” He can simply say, “Necessary, is it?” and then kill you. Obviously it is not necessary, since he can do otherwise. So what did you mean by calling it necessary? You meant it was necessary for some hypothesis.
You are assuming that the only thing that counts as necessity per se is physical necessity, ie there is no physical possiibity of doing otherwise. But moral necessity is more naturally cashed out as the claim that there is no permissable state of affairs in which the murdered can murder.
In less abstract terms, what we are saying is that morality does not work like a common-or-garden in-order-to-achieve-X-do-Y. because you cannot excuse yourself , or obtain permissibility, simply by stating that you have some end in mind other than being moral. Even without logical necessity, morality has social obligatoriness, and that needs to be explained, and a vanilla account in terms of hypotetical necessities in order to achieve arbtrary ends cannot do that.
The reason that moral good means doing something good, is that the hypothesis that we always care about, is whether it would be good to do something.
If the moral good were just a rubber-stamp of approval for whatever we have in our utility functions, there would
be no need for morality as a behaviour-shaping factor in human society. Morality is not “do what thou wilt”.
That gives you a reason to say “it is necessary” without saying for what, because everyone wants to do something that would be good to do.
In some sense of “good”, but, as usual, an unqualified “good” does not give you plausible morality.
Suppose you define moral goodness to be something else. Then it might turn out that it would be morally bad to do something that would be good to do, and morally good to do something that would be bad to do. But in that case, who would say that we ought to do the thing which is morally good, instead of the thing that would be good to do?
It’s tautologous that we morally-should do what is morally-good.
The “no permissible state of affairs” idea is also hypothetical necessity: “you must do this, if we want a situation which we call permissible.”
As I think I have stated previously, the root of this disagreement is that you believe, like Eliezer, that reality is indifferent in itself. I do not believe that.
In particular, I said that good things tend to make us desire them. You said I had causality reversed there. But I did not: I had it exactly right. Consider survival, which is an obvious case of something good. Does the fact that we desire something, e.g. eating food instead of rocks, make it into something that makes us survive? Or rather, is the fact that it makes us survive the cause of the fact that we desire it? It is obvious from how evolution works that the latter is the case and not the former. So the fact that eating food is good is the cause of the fact that we desire it.
I said the basic moral question is whether it would be good to do something. You say that this is putting a “rubber-stamp of approval for whatever we have in our utility functions.” This is only the case, according to your misunderstanding of the relationship between desire and good. Good things tend to make us desire them. But just because there is a tendency, does not mean it always works out. Things tend to fall, but they don’t fall if someone catches them. And similarly good things tend to make us desire them, but once in a while that fails to work out and someone desires something bad instead. So saying “do whatever is good to do,” is indeed morality, but it definitely does not mean “do whatever thou wilt.”
I don’t care about “morally-should” as opposed to what I should do. I think I should do whatever would be good to do; and if that’s different from what you call moral, that’s too bad for you.
The “no permissible state of affairs” idea is also hypothetical necessity: “you must do this, if we want a situation which we call permissible.”
I still don’t think you have made a good case for morality being hypothetical, since you haven’t made a case against the case against. And I still think you need to explain obligatoriness.
In particular, I said that good things tend to make us desire them. You said I had causality reversed there. But I did not: I had it exactly right. Consider survival, which is an obvious case of something good.
Survival is good, you say. If I am in a position to ensure my survival by sacrificing Smith, is it morally good to do so? After all Smith’s survival is just as Good as mine.
I don’t care about “morally-should” as opposed to what I should do.
Doens’t-care is made to care. If you don’t behave as though you care about morality, society will punish you.
However. it won’t punish you for failing to fulfil other shoulds.
I didn’t see any good case against morality being hypothetical, not even in that article.
I did explain obligatoriness. It is obligatory to do something morally good because we don’t have a choice about wanting to do something good. Everyone wants to do that, and the only way you can do that is by doing something morally good.
I did said I do not care about morally-should “as opposed” to what I should do. It could sometimes happen that I should not do something because people will punish me if I do it. In other words, I do care about what I should do, and that is determined by what would be good to do.
I did explain obligatoriness. It is obligatory to do something morally good because we don’t have a choice about wanting to do something good. Everyone wants to do that,
From which it follows that nobody ever fails to do what is morally good, and that their inevitable moral goodness is th result of inner psychological compulsion, not outer systems of reward and punishment, and that no systems of reward and punishment systems were ever necessary. All of that is clearly false.
and the only way you can do that is by doing something morally good.
Unless there are non-moral gods, which there clearly are,since there are immoral and amoral acts committed to obtain them.
“From which it follows that nobody ever fails to do what is morally good”
No, it does not, unless you assume that people are never mistaken about what would be good to do. I already said that people are sometimes mistaken about this, and think that it would be good to do something, when it would be bad to do it. In those cases they fail to do what is morally good.
I agree there are non-moral goods, e.g. things like pleasure and money and so on. That is because a moral good is “doing something good”, and pleasure and money are not doing anything. But people who commit immoral acts in order to obtain those goods, also believe that they are doing something good, but they are mistaken.
I was surprised to hear that you doubt that there are ever conflicts in desires.
Re-read what I said. That’s not what I said.
Right. You said:
In reality, I’m not sure there ever are precise conflicts of true foundational desires.
Do you have an objective set of criteria for differentiating between true foundational desires and other types of desires? If not, I wonder if it is really useful to respond to an objection arising from the rather obvious fact that people often have conflicting desires by stating that you doubt that true foundational desires are ever in precise conflict.
First get straight: good literally objectively does mean desirable.
As CCC has already pointed out, no, it is not apparent that (morally) good and desirable are the same thing. I won’t spend more time on this point since CCC addressed it well.
Your question about conflict can’t change that (thus it’s a red herring).
The issue that we are discussing is objective morals. Your equating goodness and desirability leads (in my example of the sadist) A to believe that hurting B is good, and B to believe that hurting B is not good. But moral realism holds that moral valuations are statements that are objectively true or false. So, conflicting desires is not a red herring, since conflicting desires leads (using your criterion) to subjective moral evaluations regarding the goodness of hurting B. Game theory on the other hand does appear to be a red herring – no application of game theory can change the fact that A and B differ regarding the desirability of hurting B.
One additional problem with equating moral goodness with desirability is that it leads to moral outcomes that are in conflict with most people’s moral intuitions. For example, in my example of the sadist A desires to hurt B, but most people’s moral intuition would say that A hurting B just because A wants to hurt B would be immoral. Similarly, rape, murder, theft, etc., could be considered morally good by your criterion if any of those things satisfied a desire. While conflicting with moral intuition does not prove that your definition is wrong, it seems to me that it should at a minimum raise a red flag. And, I think that the burden is on you to explain why anyone should reject his/her moral intuition in favor of a moral criterion that would adjudge theft, rape and murder to be morally good if they satisfy a true desire.
I believe that iff naturalism is true then strong moral realism is as well. If naturalism is true then there are no additional facts needed to determine what is moral than the positions of particles and the outcomes of arranging those particles differently. Any meaningful question that can be asked of how to arrange those particles or rank certain arrangements compared to others must have an objective answer because under naturalism there are no other kinds and no incomplete information. For the question to remain unanswerable at that point would require supernatural intervention and divine command theory to be true.
You need to refute non-cognitivism, as well as asserting naturalism.
Naturalism says that all questions that have answer have naturalistic answers, which means that if there are answers to ethical questions, they are naturalistic answers. But there is no guarantee that ethical questions mean anything, that they have answers.
For the question to remain unanswerable at that point would require supernatural intervention and divine command theory to be true.
No, only non-cogntivism, the idea that ethical questions just don’t make sense, like “how many beans make yellow?”.
. If you there can’t be an objective answer to morality, then FAI is literally impossible.
Not unless the “F” is standing for something weird. Absent objective morality, you can possibly solve the control problem, ie achieving safety by just making the AI do what you want; and absent objective morality, you can possibly achieve AI safety by instilling a suitable set of arbitrary values. Neither is easy, but you said “impossible”.
Do remember that your thoughts and preference on ethics are themselves an arrangement of particles to be solved.
That’s not an argument for cognitivism. When I entertain the thought “how many beans make yellow?”, that’s an arrangement of particles.
Instead I posit that the real morality is orders of magnitude more complicated, and finding it more difficult, than for real physics, real neurology, real social science, real economics, and can only be solved once these other fields are unified.
Do you have an argument for that proposal? Because I am arguing for something much simpler, that morality only needs to be grounded at the human level, so reductionism is neither denied nor employed.
If we were uncertain about the morality of stabbing someone, we could hypothetically stab someone to see what happens. When the particles of the knife rearranges the particles of their heart into a form that harms them, we’ll know it isn’t moral. When a particular subset of people with extensive training use their knife to very carefully and precisely rearrange the particles of the heart to help people, we call those people doctors and pay them lots of money because they’re doing good. But without a shitload of facts about how to exactly stab someone in the heart to save their life, that moral option would be lost to you. And the real morality is a superset that includes that action along with all others.
It’s hard to see what point you are making there. The social and evaluative aspects do make a difference to the raw physics, and so much that the raw physics counts for very little. yet previously you were insisting that a reduction to fundamental particles was what underpinned the objectivity of morality.
Yes; good points! Do note that my original comment was made eight years ago! (At least – it was probably migrated from Overcoming Bias if this post is as early as it seems to be.)
So I have had some time to think along these lines a little more :)
But I don’t think intelligence itself can lead one to conclude as you have:
If it is intelligent, it will make the distinction.
It’s not obvious to me now that any particular distinction will be made by any particular intelligence. There’s maybe not literally infinite, but still a VAST number of possible ontologies with which to make distinctions. The general class of ‘intelligent systems’ is almost certainly WAY more alien than we can reasonably imagine. I don’t assume that even a ‘super-intelligence’ would definitely ever “differentiate between smiley-faces and happy people”.
But I don’t remember this post that well, and I was going to re-read before I remembered that I didn’t even know what I was originally replying to (as it didn’t seem to be the post itself), and re-constructing the entire context to write a better reply which my temporal margin “is too narrow to contain” at the moment.
But I think I still disagree with whatever Shane wrote!
It is just me, or are things getting a bit unfriendly around here?
Anyway...
Wiring up the AI to maximise happy faces etc. is not a very good idea, the goal is clearly too shallow to reflect the underlying intent. I’d have to read more of Hibbard’s stuff to properly understand his position, however.
That said, I do agree with a more basic underlying theme that he seems to be putting forward. In my opinion, a key, perhaps even THE key to intelligence is the ability to form reliable deep abstractions. In Solomonoff induction and AIXI you see this being driving by the Kolmogorov compressor, in the brain the neocortical hierarchy seems to be key. Furthermore, if you adopt the perspective I’ve taken on intelligence (i.e. the universal intelligence measure) you see that the reverse implication is true: intelligence actually requires the ability to form deep abstractions. In which case, a super intelligent machine must have the ability to form very deep and reliable abstractions about the world. Such a machine could still try to turn the world into happy faces, if this was its goal. However, it wouldn’t do this by accident because its ability to form abstractions was so badly flawed that it doesn’t differentiate between smiling faces and happy people. It’s not that stupid. Note that this goes for forming powerful abstractions in general, not just human things like happiness and faces.
“It’s not that stupid.”
What if it doesn’t care about happiness or smiles or any other abstractions that we value? A super-intelligence isn’t an unlimited intelligence, i.e. it would still have to choose what to think about.
I think the point is that if you accept this definition of intelligence, i.e. that it requires the ability to form deep and reliable abstractions about the world, then it doesn’t make sense to talk about any intelligence (let alone a super one) being unable to differentiate between smiley-faces and happy people. It isn’t a matter, at least in this instance, of whether it cares to make that differentiation or not. If it is intelligent, it will make the distinction. It may have values that would be unrecognizable or abhorrent to humans, and I suppose that (as Shane_Legg noted) it can’t be ruled out that such values might lead it to tile the universe with smiley-faces, but such an outcome would have to be the result of something other than a mistake. In other words, if it really is “that stupid,” it fails in a number of other ways long before it has a chance to make this particular error.
I wrote a post about this! See The genie knows, but doesn’t care.
It may not make sense to talk about a superintelligence that’s too dumb to understand human values, but it does make sense to talk about an AI smart enough to program superior general intelligences that’s too dumb to understand human values. If the first such AIs (‘seed AIs’) are built before we’ve solved this family of problems, then the intelligence explosion thesis suggests that it will probably be too late. You could ask an AI to solve the problem of FAI for us, but it would need to be an AI smart enough to complete that task reliably yet too dumb (or too well-boxed) to be dangerous.
Thanks for the reply, Robb. I’ve read your post and a good deal of the discussion surrounding it.
I think I understand the general concern, that an AI that either doesn’t understand or care about our values could pose a grave threat to humanity. This is true on its face, in the broad sense that any significant technological advance carries with it unforeseen (and therefore potentially negative) consequences. If, however, the intelligence explosion thesis is correct, then we may be too late anyway. I’ll elaborate on that in a moment.
First, though, I’m not sure I see how an AI “too dumb to understand human values” could program a superior general intelligence (i.e. an AI that is smart enough to understand human values). Even so, assuming it is possible, and assuming it could happen on a timescale and in such a way as to preclude or make irrelevant any human intervention, why would that change the nature of the superior intelligence from being, say, friendly to human interests, to being hostile to them? Why, for that matter, would any superintelligence (that understands human values, and that is “able to form deep and reliable abstractions about the world”) be predisposed to any particular position vis-a-vis humans? And even if it were predisposed toward friendliness, how could we possibly guarantee it would always remain so? How, that is, having once made a friend, can we foolproof ourselves against betrayal? My intuition is that we can’t. No step can be taken without some measure of risk, however small, and if the step has potentially infinitely negative consequences, then even the very slightest of risks begins to look like a bad bet. I don’t know a way around that math.
The genie, as you say, doesn’t care. But also, often enough, the human doesn’t care. He is constrained, of course, by his fellow humans, and by his environment, but he sometimes still manages (sometimes alone, sometimes in groups) to sow massive horror among his fellows, sometimes even in the name of human values. Insanity, for instance, in humans, is always possible, and one definition of insanity might even be: behavior that contradicts, ignores or otherwise violates the values of normal human society. “Normal” here is variable, of course, for the simple reason that “human society” is also variable. That doesn’t stop us, however, from distinguishing, as we generally do, between the insane and the merely stupid, even if upon close inspection the lines begin to blur. Likewise, we occasionally witness—and very frequently we imagine (comic books!) - cases where a human is both super-intelligent and super-insane. The fear many people have with regard to strong AI (and it is perhaps well-grounded, or well-enough), is that it might be both super-intelligent and, at least as far as human values are concerned, super-insane. As an added bonus, and certainly if the intelligence explosion thesis is correct, it might also be unconstrained or, ultimately, unconstrainable. On this much I think we agree, and I assume the goal of FAI is precisely to find the appropriate constraints.
Back now, though, to the question of “too late.” The family of problems you propose to solve before the first so-called seed AIs are built include, if I understand you correctly, a formal definition of human values. I doubt very much that such a solution is possible—and “never” surely won’t help us any more than “too late”—but what would the discovery of (or failure to discover) such a solution have to do with a mistake such as tiling the universe with smiley-faces (which seems to me much more a semantic error than an error in value judgment)? If we define our terms—and I don’t know any definition of intelligence that would allow the universe-tiling behavior to be called intelligent—then smiley faces may still be a risk, but they are not a risk of intelligent behavior. They are one way the project could conceivably fail, but they are not an intelligent failure.
On the other hand, the formal-definition-of-human-values problem is related to the smiley faces problem in another way: any hard-coded solution could lead to a universe of bad definitions and false equivalencies (smiles taken for happiness). Not because the AI would make a mistake, but because human values are neither fixed nor general nor permanent: to fix them (in code), and then propagate them on the enormous scale the intelligence explosion thesis suggests, might well lead to some kind of funneling effect, perhaps very quickly, perhaps over a long period of time, that produces, effectively, an infinity of smiley faces. In other words, to reduce an irreducible problem doesn’t actually solve it. For example, I value certain forms of individuality and certain forms of conformity, and at different times in my life I have valued other and even contradictory forms of individuality and other and even contradictory forms of conformity. I might even, today, call certain of my old individualistic values conformist values, and vice-versa, and not strictly because I know more today than I knew then. I am, today, quite differently situated in the world than I was, say, twenty years ago; I may even be said to be somewhat of a different person (and yet still the same); and around me the world itself has also changed. Now, these changes, these changing and contradictory values may or may not be the most important ones, but how could they be formalized, even conceptually? There is nothing necessary about them. They might have gone the other way around. They might not have changed at all. A person can value change and stability at the same time, and not only because he has a fuzzy sense of what those concepts mean. A person can also have a very clear idea of what certain concepts mean, and those concepts may still fail to describe reality. They do fail, actually, necessarily, which doesn’t make them useless—not at all—but knowledge of this failure should at least make us wary of the claims we produce on their behalf.
What am I saying? Basically, that the pre-seed hard-coding path to FAI looks pretty hopeless. If strong AI is inevitable, then yes, we must do everything in our power to make it friendly; but what exactly is in our power, if strong AI (which by definition means super-strong, and super-super-strong, etc.) is inevitable? If the risks associated with strong AI are as grave as you take them to be, does it really seem better to you (in terms of existential risk to the human race) for us to solve FAI—which is to say, to think we’ve solved it, since there would be no way of testing our solution “inside the box”—than to not solve strong AI at all? And if you believe that there is just no way to halt the progress toward strong AI (and super, and super-super), is that compatible with a belief that “this kind of progress” can be corralled into the relatively vague concept of “friendliness toward humans”?
Better stop there for the moment. I realize I’ve gone well outside the scope of your comment, but looking back through some of the discussion raised by your original post, I found I had more to say/think about than I expected. None of the questions here are meant to be strictly rhetorical, a lot of this is just musing, so please respond (or not) to whatever interests you.
Superior to what? If they are only as smart as the average person, then all things being equal, they will be as good as the average peson as figuring out morality. If they are smarter, they will be better, You seem to be tacitly assuming that the Seed AIs are designing walled-off unupdateable utility functions. But if one assumes a more natural architecture, where moral sense is allowed to evolve with eveythign else, you would expect and incremental succession of AIs to gradually get better at moral reasoning. And if it fooms, it’s moral reasoning will fomm along with eveything else, because you haven’t created an artificial problem by firewalling it off.
It’s quite possible that I’m below average, but I’m not terribly impressed by my own ability to extrapolate how other average people’s morality works—and that’s with the advantage of being built on hardware that’s designed toward empathy and shared values. I’m pretty confident I’m smarter than my cat, but it’s not evident that I’m correct when I guess at the cat’s moral system. I can be right, at times, but I can be wrong, too.
Worse, that seems a fairly common matter. There are several major political discussions involving moral matters, where it’s conceivable that at least 30% of the population has made an incorrect extrapolation, and probable that in excess of 60% has. And this only gets worse if you consider a time variant : someone who was as smart as the average individual in 1950 would have little problem doing some very unpleasant things to Alan Turing. Society (luckily!) developed since then, but it has mechanisms for development and disposal of concepts that AI do not necessarily have or we may not want them to have.
((This is in addition to general concerns about the universality of intelligence : it’s not clear that the sort of intelligence used for scientific research necessarily overlaps with the sort of intelligence used for philosophy, even if it’s common in humans.))
Well, the obvious problem with not walling off and making unupdateaable the utility function is that the simplest way to maximize the value of a malleable utility function is to update it to something very easy. If you tell an AI that you want it to make you happy, and let it update that utility function, it takes a good deal less bit-twiddling to define “happy” as a steadily increasing counter. If you’re /lucky/, that means your AI breaks down. If not, it’s (weakly) unfriendly.
You can have a higher-level utility function of “do what I mean”, but not only is that harder to define, it has to be walled off, or you have “what I mean” redirected to a steadily increasing counter. And so on and so forth through higher levels of abstraction.
If you were bad at figuring out morality , you would be in jail. I am not sure what you mean by other people’s morality: I find the idea that there can be multiple ,valid effective moralities in society incoherent- like an economy where everyone has their own currency. You are not in jail so you learnt morality.(You don’t seem to believe morality is entirely hardwired , because you regard it as varying across short spans of time)
I also don’t know what you mean by an incorrect eextrapolation. If morality is objective, then most people might be wrong about it. However, an .AI will not pose a threat unless it is worse than the prevailing standard...the absolute standard does not matter.
Why would an .AI dumb enough to believe in 1950s morality be powerful enough to impose its views on a society that knows better?
Why wuld a smart AI lack mechanisms for disposing of concepts? How it could it self improve without such a mechanism ? If it’s too dumb to update,why would it be a threat?
If there is no NGI, there is no AGI. If there is no AGI, there is no threat of AGI. The threat posed by specialised optimisers is quite different...they can be boxed off if they cannot speak.
The failure modes of updateable UFs are wireheading failure modes, not destroy the world failure modes.
Superior to itself.
That’s not generally true of human-level intelligences. We wouldn’t expect a random alien species that happens to be as smart as humans to be very successful at figuring out human morality. It maybe true if the human-level AGI is an unmodified emulation of a human brain. But humans aren’t very good at figuring out morality; they can make serious mistakes, though admittedly not the same mistakes Eliezer gives as examples above. (He deliberately picked ones that sound ‘stupid’ to a human mind, to make the point that human concepts have a huge amount of implicit complexity built in.)
Not necessarily. The average chimpanzee is better than the average human at predicting chimpanzee behavior, simulating chimpanzee values, etc. (See Sympathetic Minds.)
Utility functions that change over time are more dangerous than stable ones, because it’s harder to predict how a descendant of a seed AI with a heavily modified utility function will behave than it is to predict how a descendant with the same utility function will behave.
If we don’t solve the problem of Friendly AI ourselves, we won’t know what trajectory of self-modification to set the AI on in order for it to increasingly approximate Friendliness. We can’t tell it to increasingly approximate something that we ourselves cannot formalize and cannot point to clear empirical evidence of.
We already understand arithmetic, so we know how to reward a system for gradually doing better and better at arithmetic problems. We don’t understand human morality or desire, so we can’t design a Morality Test or Wish Test that we know for sure will reward all and only the good or desirable actions. We can make the AI increasingly approximate something, sure, but how do we know in advance that that something is something we’d like?
Assuming morality is lots of highly localised, different things...which I don’t , particularly. if it is not, then you can figure it out anywhere, If it is,then the problem the aliens have is not that morality is imponderable, but that they are don’t have access to the right data. They don’t know how things on earth. However, an AI built on Earth would. So the situation is not analogous. The only disadvantage an AI would have is not having biological drives itself, but it is not clear that an entity needs to have drives in order to understand them. We could expect a SIAI to get incrementally betyter at maths than us until it surpasses us; we wouldn’t worry that i would hit on the wrong maths, because maths is not a set of arbitrary, disconnected facts.
An averagely intelligent AI with an average grasp of morality would not be more of a threat than an average human. A smart AI, would, all other things being equal, be better at figuring out morality. But all other things are not equal, because you want to create problems by walling off the UF.
I’m sure they do. That seems to be why progress in AGI , specifically use of natural language,has been achingly slow. But why should moral concepts be so much more difficult than others? An AI smart enough to talk its way out of a box would be able to understand the implicit complexity: an AI too dumb to understand implicit complexity would be boxable. Where is the problem?
Things are not inherently dangerous just because they are unpredictable. If you have some independent reason fo thinking something might turn dangerous, then it becomes desirable to predict it.
But Superintelligent artificial general intelligences are generally assumed to be good at everything: they are not assumed to develop mysterious blind spots about falconry or mining engineering, Why assume they will develop a blind spot about morality? Oh yes...because you have assumed from the outset that the UF must be walled off from self improvement...in order to be safe. You are only facing that particular failure mode because of something you decided on to be safe.
The average person manages to solve the problem of being moral themselves, in a good-enough way. You keep assuming, without explanation that an AI can’t do the same.
Why isn’t having a formalisation of morality a problem with humans? We know how humans incrementally improve as moral reasoners: it’s called the Kohlberg hierarchy.
We don’t have perfect morality tests. We do have morality tests. Fail them and you get pilloried in the media or sent to jail.
Again, you are assuming that morality is something highly local and arbitrary. If it works like arithmetic, that is if it is an expansion of some basic principles, then we can tell that is heading in the right direction by identifying that its reasoning is in line with those principles.
The problem of FAI is the problem of figuring out all of humanity’s deepest concerns and preferences, not just the problem of figuring out the ‘moral’ ones (whichever those are). E.g., we want a superintelligence to not make life boring for everyone forever, even if ‘don’t bore people’ isn’t a moral imperative.
Regardless, I don’t see how the moral subset of human concerns could be simplified without sacrificing most human intuitions about what’s right and wrong. Human intuitions as they stand aren’t even consistent, so I don’t understand how you can think the problem of making them consistent and actionable is going to be a simple one.
Someday, perhaps. With enough time and effort invested. Still, again, we would expect a lot more human-intelligence-level aliens (even if those aliens knew a lot about human behavior) to be good at building better AIs than to be good at formalizing human value. For the same reason, we should expect a lot more possible AIs we could build to be good at building better AIs than to be good at formalizing human value.
I don’t know what you mean by ‘imponderable’. Morality isn’t ineffable; it’s just way too complicated for us to figure out. We know how things are on Earth; we’ve been gathering data and theorizing about morality for centuries. And our progress in formalizing morality has been minimal.
An AI that’s just a copy of a human running on transistors is much more powerful than a human, because it can think and act much faster.
It would also be better at figuring out how many atoms are in my fingernail, but that doesn’t mean it will ever get an exact count. The question is how rough an approximation of human value can we allow before all value is lost; this is the ‘fragility of values’ problem. It’s not enough for an AGI to do better than us at FAI; it has to be smart enough to solve the problem to a high level of confidence and precision.
First, because they’re anthropocentric; ‘iron’ can be defined simply because it’s a common pattern in Nature, not a rare high-level product of a highly contingent and complex evolutionary history. Second, because they’re very inclusive; ‘what humans care about’ or ‘what humans think is Right’ is inclusive of many different human emotions, intuitions, cultural conventions, and historical accidents.
But the main point is just that human value is difficult, not that it’s the most difficult thing we could do. If other tasks are also difficult, that doesn’t necessarily make FAI easier.
You’re forgetting the ‘seed is not the superintelligence’ lesson from The genie knows, but doesn’t care. If you haven’t read that article, go do so. The seed AI is dumb enough to be boxable, but also too dumb to plausibly solve the entire FAI problem itself. The superintelligent AI is smart enough to solve FAI, but also too smart to be safely boxed; and it doesn’t help us that an unFriendly superintelligent AI has solved FAI, if by that point it’s too powerful for us to control. You can’t safely pass the buck to a superintelligence to tell us how to build a superintelligence safe enough to pass bucks to.
Yes. The five theses give us reason to expect superintelligent AI to be dangerous by default. Adding more unpredictability to a system that already seems dangerous will generally make it more dangerous.
‘The genie knows, but doesn’t care’ means that the genie (i.e., superintelligence) knows how to do human morality (or could easily figure it out, if it felt like trying), but hasn’t been built to care about human morality. Knowing how to behave the way humans want you to is not sufficient for actually behaving that way; Eliezer makes that point well in No Universally Compelling Arguments.
The worry isn’t that the superintelligence will be dumb about morality; it’s that it will be indifferent to morality, and that by the time it exists it will be too late to safely change that indifference. The seed AI (which is not a superintelligence, but is smart enough to set off a chain of self-modifications that lead to a superintelligence) is dumb about morality (approximately as dumb as humans are, if not dumber), and is also probably not a particularly amazing falconer or miner. It only needs to be a competent programmer, to qualify as a seed AI.
Good enough for going to the grocery store without knifing anyone. Probably not good enough for safely ruling the world. With greater power comes a greater need for moral insight, and a greater risk should that insight be absent.
It is a problem, and it leads to a huge amount of human suffering. It doesn’t mean we get everything wrong, but we do make moral errors on a routine basis; the consequences are mostly non-catastrophic because we’re slow, weak, and have adopted some ‘good-enough’ heuristics for bounded circumstances.
Just about every contemporary moral psychologist I’ve read or talked to seems to think that Kohlberg’s overall model is false. (Though some may think it’s a useful toy model, and it certainly was hugely influential in its day.) Haidt’s The Emotional Dog and Its Rational Tail gets cited a lot in this context.
That’s certainly not good enough. Build a superintelligence that optimizes for ‘following the letter of the law’ and you don’t get a superintelligence that cares about humans’ deepest values. The law itself has enough inexactness and arbitrariness that it causes massive needless human suffering on a routine basis, though it’s another one of those ‘good-enough’ measures we keep in place to stave off even worse descents into darkness.
Human values are an evolutionary hack resulting from adaptations to billions of different selective pressures over billions of years, innumerable side-effects of those adaptations, genetic drift, etc. Arithmetic can be formalized in a few sentences. Why think that humanity’s deepest preferences are anything like that simple? Our priors should be very low for ‘human value is simple’ just given the etiology of human value, and our failure to converge on any simple predictive or normative theory thus far seems to only confirm this.
Those two things turn out to be identical (deepest concerns and preferences=the ‘moral’ ones). Because nothing else can be of greater importance to a decision maker.
PART 2
I am arguing that it would not have to solve AI itself.
Huh? If it is moral and alien friendly , why would you need to box it?
If it’s friendly, why enslave it?
The five theses are variously irrelevant and misapplied. Details supplied on request.
‘>The genie knows, but doesn’t care’ means that the genie (i.e., superintelligence) knows how to do human morality (or could easily figure it out, if it felt like trying), but hasn’t been built to care about human morality.
What genie? Who built it that way? If your policy is to build an artificial philosopher, an AI that can solve morality is itself, why would you build it to not act on what it knows?
No, his argument is irrelevant as explained in this comment.
You don’t have to pre-programme the whole of friendliness or morality to fix that. If you have reason to suspect that there are no intrinsically compelling concepts, then you can build an AI that wants to be moral, but needs to figure otu what that is.
Which is only a problem if you assume, as I don’t, that it will be pre-programming a fixed morality.
With greater intelligence comes greater moral insight—unless you create a problem by walling off that part of an AI.
OK. The consequences are non catastrophic. An AI with imperfect, good-enough morality would not be an existential threat.
And does Haidt’s work mean that everyone is one par, morally? Does it mean that no one can progress in moral insight?
It isn’t good enough for a ceiling: it is good enough for a floor.
De facto ones are, yes. Likewise folk physics is an evolutionary hack. But if we build an AI to do physics, we don’t intend it to do folk physics, we intend it to do physics.
There’s a theory of morality that can be expressed in a few sentences, and leaves preferences as variables to be filled in later. It’s called utilitarianism.
So? If value is complex, that doesn’t affect utilitarianism, for instance. You,and other lesswrongian writers, keep behaving as though “values are X” is obviously equivalent to “morality is X”.
You’re confusing ‘smart enough to solve FAI’ with ‘actually solved FAI’, and you’re confusing ‘actually solved FAI’ with ‘self-modified to become Friendly’. Most possible artificial superintelligences have no desire to invest much time into figuring out human value, and most possible ones that do figure out human value have no desire to replace their own desires with the desires of humans. If the genie knows how to build a Friendly AI, that doesn’t imply that the genie is Friendly; so superintelligence doesn’t in any way imply Friendliness even if it implies the ability to become Friendly.
Why does that comment make his point irrelevant? Are you claiming that it’s easy to program superintelligences to be ‘rational″, where ‘rationality’ doesn’t mean instrumental or epistemic rationality but instead means something that involves being a moral paragon? It just looks to me like black-boxing human morality to make it look simpler or more universal.
And how do you code that? If the programmers don’t know what ‘be moral’ means, then how do they code the AI to want to ‘be moral’? See Truly Part Of You.
A human with superintelligence-level superpowers would be an existential threat. An artificial intelligence with superintelligence-level superpowers would therefore also be an existential threat, if it were merely as ethical as a human. If your bar is set low enough to cause an extinction event, you should probably raise your bar a bit.
No. Read Haidt’s paper, and beware of goalpost drift.
No. Human law isn’t built for superintelligences, so it doesn’t put special effort into blocking loopholes that would be available to an ASI. E.g., there’s no law against disassembling the Sun, because no lawmaker anticipated that anyone would have that capability.
… Which isn’t computable, and provides no particular method for figuring out what the variables are. ‘Preferences’ isn’t operationalized.
Values in general are what matters for Friendly AI, not moral values. Moral values are a proper subset of what’s important and worth protecting in humanity.
PART 1 of 2
The AI might need a lot of localised information for friendliness, but it needn’t be preprogrammed.
You have assumed that friendliness is a superset of morality. Assume also that an AI is capable of being moral.
Then. to have a more fun existence, all you have to do is ask it questions, like “How can we build hovering skateboards”. What failure modes could that lead to? If it doesn’t know what things humans enjoy, it can research the subject..humans can entertain their pets after all. It would have no reason to refuse to answer questions, unless the answer was dangerous to human well being (ie what humans assume is harmless fun actually isn’t). But that isn’t actually a failure. it’s a safety feature.
That’s like saying you can’t simply folk physics down to real physics without sacrificing a lot of intuitions. Intuitions that are wrong need to go.
I didn’t say it was simple. I want the SAI to do it for itself. I don’t think the alternative, of solving friendliness—which is more than morality—and preprogramming it is simple.
So what’s the critical difference between understanding value and understanding (eg) language? I think the asymmetry has come in where you assume that “value” has to be understood as a rag bag of attitudes and opinion. Everyone assumes that understanding physics means understanding the kind of physics found in textbooks, and that understanding language need only go as far as understanding a cleaned-up official version, and not a superposition of every possible dialect and idiolect. Morality/value looks difficult to you because you are taking it to be the incoherent mess you would get by throwing in everyone’s attitudes and beliefs into a pot indiscriminately. But many problems would be insoluble under that assumption.
If you assume that all intuitions have to be taken into account, even conflicting ones, then it’s nto just difficult, it’s impossible. But I don;’t assume that.
Yet the average person is averagely moral. People, presumably, are not running on formalisation. If you assume that an AI has to be preprogrammed with morality, then you can conclude that an AI will need the formalisation we don’t have. If you assume that an AI is a learning system, then it can learn and does not need to be preprogrammed.
If you speed up a chicken brain by a million, what do you get?
There is no good reason to think that “morality” and “values” are synonyms.
If it’s too dumb to solve it, it’s too dumb to be a menace; if it’s smart enough to be a menace, it’s smart enought to solve it.
Are they? Animals can be morally relevant to humans. Human can be morally relevant to aliens. Aliens cam be morally relevant to each other .
Anthropic and Universal aren’t the only options. Alien morality is a coherent concept, like alien art and alien economics.
Morality is about what is right, not what is believe to be. Physics is not folk physics or history of physics.
I don’t know what you mean by ‘preprogrammed’, and I don’t know what view you think you’re criticizing by making this point. MIRI generally supports indirect normativity, not direct normativity.
A Friendly AI is, minimally, a situation-generally safe AGI. By the intelligence explosion thesis, ‘situation-generally’ will need to encompass the situation in which an AGI self-modifies to an ASI (artificial superintelligence), and since ASI are much more useful and dangerous than human-level AGIs, the bulk of the work in safety-proofing AGI will probably go into safety-proofing ASI.
A less minimal definition will say that Friendly AIs are AGIs that bring about situations humans strongly desire/value, and don’t bring about situations they strongly dislike/disvalue. One could also treat this as an empirical claim about the more minimal definition: Any adequately safe AGI will be extremely domain-generally useful.
Regardless, nowhere in the above two paragraphs did I talk specifically about morality. Moral values are important, but they are indeed a proper subset of human values, and we don’t want an AGI to make everything worse forever even if it finds a way to do so without doing anything ‘immoral’.
No one has assumed otherwise. The problem isn’t that Friendly AI is impossible; it’s that most ASIs aren’t Friendly, and unFriendly ASIs seem to be easier to build (because they’re a more generic class).
Getting rid of wrong intuitions may well make morality more complicated, rather than less. We agree that human folk morality may need to be refined a lot, but that gives us no reason to expect the task to be easy or the end-product to be simple. Physical law appears to be simple, but it begets high-level regularities that are much less simple, like brains, genomes, and species. Morality occurs at a level closer to brains, genomes, and species than to physical law.
If human civilization depended on building an AI that can domain-generally speak English in a way that we’d ideally recognize as Correct, then I would be extremely worried. We can get away with shortcuts and approximations because speaking English correctly isn’t very important. But getting small things permanently wrong about human values is important, when you’re in control of the future of humanity.
It might not look fair that humans have to deal with such a huge problem, but the universe doesn’t always give people reasonable-sized challenges.
It has to be preprogrammed to learn the right things, and to incorporate the right things it’s learned into its preferences. Saying ‘Just program the AI to learn the right preferences’ doesn’t solve the problem; programming the AI to learn the right preferences is the problem. See Detached lever fallacy:
“All this goes to explain why you can’t create a kindly Artificial Intelligence by giving it nice parents and a kindly (yet occasionally strict) upbringing, the way it works with a human baby. As I’ve often heard proposed.
“It is a truism in evolutionary biology that conditional responses require more genetic complexity than unconditional responses. To develop a fur coat in response to cold weather requires more genetic complexity than developing a fur coat whether or not there is cold weather, because in the former case you also have to develop cold-weather sensors and wire them up to the fur coat.
“But this can lead to Lamarckian delusions: Look, I put the organism in a cold environment, and poof, it develops a fur coat! Genes? What genes? It’s the cold that does it, obviously.
“There were, in fact, various slap-fights of this sort, in the history of evolutionary biology—cases where someone talked about an organismal response accelerating or bypassing evolution, without realizing that the conditional response was a complex adaptation of higher order than the actual response. (Developing a fur coat in response to cold weather, is strictly more complex than the final response, developing the fur coat.) [...]
“But the upshot is that if you have a little baby AI that is raised with loving and kindly (but occasionally strict) parents, you’re pulling the levers that would, in a human, activate genetic machinery built in by millions of years of natural selection, and possibly produce a proper little human child. Though personality also plays a role, as billions of parents have found out in their due times.
“It’s easier to program in unconditional niceness, than a response of niceness conditional on the AI being raised by kindly but strict parents. If you don’t know how to do that, you certainly don’t know how to create an AI that will conditionally respond to an environment of loving parents by growing up into a kindly superintelligence. If you have something that just maximizes the number of paperclips in its future light cone, and you raise it with loving parents, it’s still going to come out as a paperclip maximizer. There is not that within it that would call forth the conditional response of a human child. Kindness is not sneezed into an AI by miraculous contagion from its programmers. Even if you wanted a conditional response, that conditionality is a fact you would have to deliberately choose about the design.
“Yes, there’s certain information you have to get from the environment—but it’s not sneezed in, it’s not imprinted, it’s not absorbed by magical contagion. Structuring that conditional response to the environment, so that the AI ends up in the desired state, is itself the major problem. ‘Learning’ far understates the difficulty of it—that sounds like the magic stuff is in the environment, and the difficulty is getting the magic stuff inside the AI. The real magic is in that structured, conditional response we trivialize as ‘learning’. That’s why building an AI isn’t as easy as taking a computer, giving it a little baby body and trying to raise it in a human family. You would think that an unprogrammed computer, being ignorant, would be ready to learn; but the blank slate is a chimera.”
I mean the proposal to solve morality and code it in to an AI.
Whrich is to say that full Fat friendliness is a superset of minimal friendliness . But minimal friendliness is just what I have been calling morality, and I dont see why I shouldn’t continue. So friendliness is a superset of morality, as I said.
.....by your assumptions, that morality/friendliness needs to say be solved separately from intelligence. But that is just what I am disputing.
An AGI can be useful without wanting to do anything but answer questions accurately.
You didn’t do use the word. But I think “not doing bad things, whilst not necessarily doing fun things either” picks out the same referent.
I find it hard to interpret that statement. How can making things worse forever not be immoral? What non-moral definition of worse are you mean using ?
We have very good reason to think that the one true theory of something will be simpler, in Kolmogorov terms, than a mishmash of everybody’s guesses Physics is simpler than folk physics. (It is harder to learn, because that requires the effortful system II to engage...but effort and complexity are different things).
And remember , my assumption is that the AI works out morality itself.
If an ASI can figure out such high level subjects as biology and decision theory, why shouldn’t it be useful able to figure out morality?
Why wouldn’t an AI that is smarter than US not be able to realise that for itself ?
That is confusingly phrased. A learning system needs some basis to learn, granted. You assume, tacitly, that it need not be preprogrammed with the right rules of grammar or economics. Why make exception for ethics?
A learning system needs some basis other than external stimulus to learn: given that, it is quite possible for most of the information to be contained in the stimulus, the data. Consider language. Do you think an AI will have to be preprogrammed with all the contents of every dictionary
“It is a truism in evolutionary biology that conditional responses require more genetic complexity than unconditional responses. To develop a fur coat in response to cold weather requires more genetic complexity than developing a fur coat whether or not there is cold weather, because in the former case you also have to develop cold-weather sensors and wire them up to the fur coat.
“But this can lead to Lamarckian delusions: Look, I put the organism in a cold environment, and poof, it develops a fur coat! Genes? What genes? It’s the cold that does it, obviously.
“There were, in fact, various slap-fights of this sort, in the history of evolutionary biology—cases where someone talked about an organismal response accelerating or bypassing evolution, without realizing that the conditional response was a complex adaptation of higher order than the actual response. (Developing a fur coat in response to cold weather, is strictly more complex than the final response, developing the fur coat.) [...]
“But the upshot is that if you have a little baby AI that is raised with loving and kindly (but occasionally strict) parents, you’re pulling the levers that would, in a human, activate genetic machinery built in by millions of years of natural selection, and possibly produce a proper little human child. Though personality also plays a role, as billions of parents have found out in their due times.
“It’s easier to program in unconditional niceness, than a response of niceness conditional on the AI being raised by kindly but strict parents. If you don’t know how to do that, you certainly don’t know how to create an AI that will conditionally respond to an environment of loving parents by growing up into a kindly superintelligence. If you have something that just maximizes the number of paperclips in its future light cone, and you raise it with loving parents, it’s still going to come out as a paperclip maximizer. There is not that within it that would call forth the conditional response of a human child. Kindness is not sneezed into an AI by miraculous contagion from its programmers. Even if you wanted a conditional response, that conditionality is a fact you would have to deliberately choose about the design.
“Yes, there’s certain information you have to get from the environment—but it’s not sneezed in, it’s not imprinted, it’s not absorbed by magical contagion. Structuring that conditional response to the environment, so that the AI ends up in the desired state, is itself the major problem. ‘Learning’ far understates the difficulty of it—that sounds like the magic stuff is in the environment, and the difficulty is getting the magic stuff inside the AI. The real magic is in that structured, conditional response we trivialize as ‘learning’. That’s why building an AI isn’t as easy as taking a computer, giving it a little baby body and trying to raise it in a human family. You would think that an unprogrammed computer, being ignorant, would be ready to learn; but the blank slate is a chimera.”
I think that RobbBB has already done a great job of responding to this, but I’d like to have a try at it too. I’d like to explore the math/morality analogy a bit more. I think I can make a better comparison.
Math is an enormous field of study. Even if we limited our concept of “math” to drawing graphs of mathematical functions, we would still have an enormous range of different kinds of functions: Hyperbolic, exponential, polynomial, all the trigonometric functions, etc. etc.
Instead of comparing math to morality, I think it’s more illustrative to compare math to the wider topic of “value-driven-behaviour”.
An intelligent creature could have all sorts of different values. Even within the realm of modern, western, democratic morality we still disagree about whether it is just and propper to execute murderers. We disagree about the extent to which a state is obligated to protect its citizens and provide a safety net. We disagree about the importance of honesty, of freedom vs. safety, freedom of speech vs. protection from hate speech.
If you look at the wider world, and at cultures through history, you’ll find a much wider range of moralities. People who thought it was not just permitted, but morally required that they enslave people, restrict the freedoms of their own families, and execute people for religious transgressions.
You might think that these are all better or worse approximations of the “one true morality”, and that a superintelligence could work out what that true morality is. But we don’t think so. We believe that these are different moralities. Fundamentally, these people have different values.
Then we can step further out, and look at the “insane” value systems that a person could hold. Perhaps we could believe that all people are so flawed that they must be killed. Or we could believe that no one should ever be allowed to die, and so we extend life indefinitely, even for people in agony. Or we might believe everyone should be lobotomised for our own safety.
And then there are the truly inhuman value systems: the paperclip maximisers, the prime pebble sorters, and the baby eaters. The idea is that a superintelligence could comprehend any and all of these. It would be able to optimise for any one of them, and foresee results and possible consequences for all of them. The question is: which one would it actually use?
A superintelligence might be able to understand all of human math and more besides, but we wouldn’t build one to simply “do all of maths”. We would build it with a particular goal and purpose in mind. For instance (to pick an arbitrary example) we might need it to create graphs of Hyperbolic functions. It’s a bad example, I know. But I hope it serves to help make the point.
Likewise, we would want the intelligence to adopt a specific set of values. Perhaps we would want them to be modern, western, democratic liberal values.
I wouldn’t expect a superintelligence to start generating Hyperbolic functions, despite the fact that it’s smart enough to do so. The AI would have no reason to start doing that particular task. It might be smart enough to work out that that’s what we want of course, but that doesn’t mean it’ll do it (unless we’ve already solved the problem of getting them to do “what humans want it to do”.) If we want Hyperbolic functions, we’ll have to program the machine with enough focus to make it do that.
Likewise, a computer could have any arbitrary utility function, any arbitrary set of values. We can’t make sure that a computer has the “right” values unless we know how to clearly define the values we want.
With Hyperbolic functions, it’s relatively easy to describe exactly, unambiguously, what we want. But morality is much harder to pin down.
The range of possible values is only a problem if you hold to the theory that morality “is” values, without any further qualifications, then an AI is going to have trouble figuring out morality apriori. If you take the view that morality is a fairly uniform way of handling values, or a subset of values, then so long as then the AI can figure it out by taking prevailing values as input, as data.
(We will be arguing that:-
Ethics fulfils a role in society, and originated as a mutually beneficial way of regulating individual actions to minimise conflict, and solve coordination problems. (“Social Realism”).
No spooky or supernatural entities or properties are required to explain ethics (naturalism is true)
There is no universally correct system of ethics. (Strong moral realism is false)
Multiple ethical constructions are possible...
Our version of ethical objectivism needs to be distinguished from universalism as well as realism,
Ethical universalism is unikely...it is unlikely that different societies would have identical ethics under different circumstances. Reproductive technology must affect sexual ethics. The availability of different food sources in the environment must affect vegetarianism versus meat eating. However, a compromise position can allow object-level ethics to vary non-arbitrarily.
In other words, there is not an objective answer to questions of the form “should I do X”, but there is an answer to the question “As a member of a society with such-and-such prevailing conditions, should I do X”. In other words still, there is no universal (object level) ethics, but there there is an objective-enough ethics, which is relativised to societies and situations, by objective features of societies and situations...our meta ethics is a function from situations to object level ethics, and since both the functions and its parameters are objective, the output is objective.
By objectivism-without-realism, we mean that mutually isolated groups of agents would be able to converge onto the same object level ethics under the same circumstances, although this convergence doesn’t imply the pre-existence of some sort of moral object, as in standard realism. We take ethics to be a social arrangement, or cultural artefact which fulfils a certain role or purpose, characterised by the reduction of conflict, allocation of resources and coordination of behaviour. By objectivism-without-universalism we mean that groups of agents under different circumstances will come up with different ethics. In either case, the functional role of ethics, in combination with the constraints imposed by concrete situations, conspire to narrow down the range of workable solutions, and (sufficiently) ideal reasoners will therefore be able to re-discover them.
)
I don’t have to believe those are equally valid. Descriptive relativism does not imply normative relativism. I would expect a sufficiently advanced AI, with access to data pertaining to the situation, to come up with the optimum morality for the situation—an answer that is objective but not universal. Where morality needs to vary because situational factors (societal wealth, reproductive technology, level of threat/security, etc). it would, but otherwise the AI would not deviate form the situational optimum to come up with reproductions of whatever suboptimal morality existed in the past.
Well, we believe that different moralities and different values are two different axes.
My hypothesis is that an AI in a modern society would come out with that or something better. (For instance, egalitarianism isn’t some arbitrary pecadillo, it is a very general and highly rediscoverable meta-level principle that makes it easier for people to co-operate).
To perform the calculation, it needs to be able to research out values, which it can. It doesn’t need to share them, as I have noted several times.
You could build an AI that adopts random value,s and pursues them relentlessly, I suppose, but that is pretty much a case of deliberately building an unfriendly AI.
What you need is a scenario where building an AI to want to understand, research, and eventually join in with huamn morality goes horribly wrong.
In detail or in principle? Given what assumptions?
So… what you’re suggesting, in short, is that a sufficiently intelligent AI can work out the set of morals which are most optimal in a given human society. (There’s the question of whether it would converge on the most optimal set of morals for the long-term benefit of the society as a whole, or the most optimal set of morals for the long-term benefit of the individual).
But let’s say the AI works out an optimal set of morals for its current society. What’s to stop the AI from metaphorically shrugging and ignoring those morals in order to rather build more paperclips? Especially given that it does not share those values.
Which individual? The might be some decision theory which promotes the interests of Joe Soap, against the interests of society, but there is no way i would call it morality.
Its motivational system. We’re already assuming it’s motivated to make the deduction, we need to assume it’s motivated to implement. I am not bypassing the need for a goal driven AI to have appropriate goals, I am by passing the need for a detailed and accurate account of human ethics to be preprogrammed.
I am not sayngn it necessarily does not. I am saying it does not necessarily.
Ah, I may have been unclear there.
To go into more detail, then; you appear to be suggesting that optimal morality can be approached as a society-wide optimisation problem; in the current situations, these moral strictures produce a more optimal society than those, and this optimisation problem can be solved with sufficient computational resources and information.
But now, let us consider an individual example. Let us say that I find a wallet full of money on the ground. There is no owner in sight. The optimal choice for the society as a whole is that I return the money to the original owner; the optimal choice for the individual making the decision is to keep the money and use it towards my aims, whatever those are. (I can be pretty sure that the man to whom I return the money will be putting it towards his aims, not mine, and if I’m sufficiently convinced that my aims are better for society than his then I can even rationalise this action).
By my current moral structures, I would have to return the money to its original owner. But I can easily see a superintelligent AI giving serious consideration to the possibility that it can do more good for the original owner with the money than the original owner could.
This, right here, is the hard problem of Friendly AI. How do we make it motivated to implement? And, more importantly, how do we know that it is motivated to implement what we think it’s motivated to implement?
You’re suggesting that it can figure out the complicated day-to-day minutae and the difficult edge cases on its own, given a suitable algorithm for optimising morality.
My experience in software design suggests that that algorithm needs to be really, really good. And extremely thoroughly checked, from every possible angle, by a lot of people.
I’m not denying that such an algorithm potentially exists. I can just think of far, far too many ways for it to go very badly wrong.
...point taken. It may or may not share those values.
But then we must at least give serious consideration to the worst-case scenario.
I believe that iff naturalism is true then strong moral realism is as well. If naturalism is true then there are no additional facts needed to determine what is moral than the positions of particles and the outcomes of arranging those particles differently. Any meaningful question that can be asked of how to arrange those particles or rank certain arrangements compared to others must have an objective answer because under naturalism there are no other kinds and no incomplete information. For the question to remain unanswerable at that point would require supernatural intervention and divine command theory to be true. If you there can’t be an objective answer to morality, then FAI is literally impossible. Do remember that your thoughts and preference on ethics are themselves an arrangement of particles to be solved. Instead I posit that the real morality is orders of magnitude more complicated, and finding it more difficult, than for real physics, real neurology, real social science, real economics, and can only be solved once these other fields are unified. If we were uncertain about the morality of stabbing someone, we could hypothetically stab someone to see what happens. When the particles of the knife rearranges the particles of their heart into a form that harms them, we’ll know it isn’t moral. When a particular subset of people with extensive training use their knife to very carefully and precisely rearrange the particles of the heart to help people, we call those people doctors and pay them lots of money because they’re doing good. But without a shitload of facts about how to exactly stab someone in the heart to save their life, that moral option would be lost to you. And the real morality is a superset that includes that action along with all others.
Even if it were true that under naturalism we could determine the outcome of various arrangements of particles, wouldn’t we still be left with the question of which final outcome was the most morally preferable?
But, you and I might have different moral preferences. How (under naturalism) do we objectively decide between your preferences and mine? And, Isn’t it also possible that neither your preferences nor my preferences are objectively moral?
Yup.
But that’s sort-of contained within “the positions of particles” (so long as all their other properties are included, such as temperature and chemical connections and so on...might need to include rays of light and non-particle stuff too!). The two are just different ways of describing the same thing. Just like every object around you could be described either with their usual names, (“keyboard:, “desk”, etc) or with an elaborate molecule by molecule description. Plenty of other descriptions are possible too (like “rectangular black colored thing with a bunch of buttons with letters on it” describes my keyboard kinda).
You don’t. True preferences (as opposed to mistaken preferences) aren’t something you get to decide. They are facts.
That’s an expression of ethical naturalism not a defence of ethcial naturalism.
Missing the point. Ethics needs to sort good actors from bad—decisions about punishments and rewards depend on it.
PS are you the same person as rkyeun? If not, to what extent are you on the same page?
(I’d say need to sort good choices from bad. Which includes the choice to punish or reward.) Discovering which choices are good and which are bad is a fact finding mission. Because:
1) it’s a fact whether a certain choice will successfully fulfill a certain desire or not
And 2) that’s what “good” literally means: desirable.
So that’s what any question of goodness will be about: what will satisfy desires.
No I’m not rkyeun. As for being on the same page...well I’m definitely a moral realist. I don’t know about their first iff-then statement though. Seems to me that strong moral realism could still exist if supernaturalism were true. Also, talking in terms of molecules is ridiculously impractical and unnecessary. I only talked in those terms because I was replying to a reply to those terms :P
Whose desires? The murderer wants to murder the victim, the victim doesn’t want to be murdered. You have realism without objectivism. There is a realistic fact about people’s preferences, but since the same act can increase one person’s utility and reduce anothers, there is no unambiguous way to label an arbitrry outcome.
Murder isn’t a foundational desire. It’s only a means to some other end. And usually isn’t even a good way to accomplish its ultimate end! It’s risky, for one thing. So usually it’s a false desire: if they knew the consequences of this murder compared to all other choices available, and they were correctly thinking about how to most certainly get what they really ultimately want, they’d almost always see a better choice.
(But even if it were foundational, not a means to some other end, you could imagine some simulation of murder satisfying both the “murderer”’s need to do such a thing and everyone else’s need for safety. Even the “murderer” would have a better chance of satisfaction, because they would be far less likely to be killed or imprisoned prior to satisfaction.)
Well first, in the most trivial way, you can unambiguously label an outcome as “good for X”. If it really is (it might not be, after all, the consequences of achieving or attempting murder might be more terrible for the would-be murderer than choosing not to attempt murder).
It works the same with (some? all?) other adjectives too. For example: soluble. Is sugar objectively soluble? Depends what you try to dissolve it in, and under what circumstances. It is objectively soluble in pure water at room temperature. It won’t dissolve in gasoline.
Second, in game theory you’ll find sometimes there are options that are best for everyone. But even when there isn’t, you can still determine which choices for the individuals maximize their chance of satisfaction and such. Objectively speaking, those will be the best choices they can make (again, that’s what it means for something to be a good choice). And morality is about making the best choices.
It can be instrumental or terminal, as can most other criminal impulses.
You can’t solve all ethical problems by keeping everyone in permanent simulation.
That’s no good. You can’t arrive at workable ethics by putting different weightings on the same actions from different perspectives. X stealing money form Y is good for X and bad for Y, so why disregard Y’s view? An act is either permitted or forbidden, punished or praised. You can’t say it is permissible-for-X but forbidden-for-Y if it involves both of them.
No, there’s no uniform treatment of all predicates. Some are one-place, some are two-place. For instance, aesthetic choices can usually be fulfilled on a person-by-person basis.
To be precise, you sometimes find solutions that leave everyone better off, and more often find solutions that leave the average person better off.
Too vague. For someone who likes killing ot kill a lot of people is the best choice for them, but not the best ethical choice.
But, what if two different people have two conflicting desires? How do we objectively find the ethical resolution to the conflict?
Basically: game theory.
In reality, I’m not sure there ever are precise conflicts of true foundational desires. Maybe it would help if you had some real example or something. But the best choice for each party will always be the one that maximizes their chances of satisfying their true desire.
I was surprised to hear that you doubt that there are ever conflicts in desires. But, since you asked, here is an example:
A is a sadist. A enjoys inflicting pain in others. A really wants to hurt B. B wishes not to be hurt by A. (For the sake of argument, lets suppose that no simulation technology is available that would allow A to hurt a virtual B, and that A can be reasonably confident that A will not be arrested and brought to trial for hurting B.)
In this scenario, since A and B have conflicting desires, how does a system that defines objective goodness as that which will satisfy desires resolve the conflict?
I would be very surprised to find that a universe whose particles are arranged to maximize objective good would also contain unpaired sadists and masochists. You seem to be asking a question of the form, “But if we take all the evil out of the universe, what about evil?” And the answer is “Good riddance.” Pun intentional.
The problem is that neither you nor BrianPansky has proposed a viable objective standard for goodness. BrianPansky said that good is that which satisfies desires, but proposed no objective method for mediating conflicting desires. And here you said “Do remember that your thoughts and preference on ethics are themselves an arrangement of particles to be solved” but proposed no way for resolving conflicts between different people’s ethical preferences. Even if satisfying desires were an otherwise reasonable standard for goodness, it is not an objective standard, since different people may have different desires. Similarly, different people may have different ethical preferences, so an individual’s ethical preference would not be an objective standard either, even if it were otherwise a reasonable standard.
No, I am not asking that. I am pointing out that neither your standard nor BrianPansky’s standard is objective. Therefore neither can be used to determine what would constitute an objectively maximally good universe nor could either be used to take all evil out of the universe, nor even to objectively identify evil.
Re-read what I said. That’s not what I said.
First get straight: good literally objectively does mean desirable. You can’t avoid that. Your question about conflict can’t change that (thus it’s a red herring).
As for your question: I already generally answered it in my previous post. Use Game theory. Find the actions that will actually be best for each agent. The best choice for each party will always be the one that maximizes their chances of satisfying their true desires.
I might finish a longer response to your specific example, but that takes time. For now, Richard Carrier’s Goal Theory Update probably covers a lot of that ground.
http://richardcarrier.blogspot.ca/2011/10/goal-theory-update.html
It does not.
Wiktionary states that it means “Acting in the interest of good; ethical.” (There are a few other definitions, but I’m pretty sure this is the right one here). Looking through the definitions of ‘ethical’, I find “Morally approvable, when referring to an action that affects others; good. ” ‘Morally’ is defined as “In keeping of requirements of morality.”, and ‘morality’ is “Recognition of the distinction between good and evil or between right and wrong; respect for and obedience to the rules of right conduct; the mental disposition or characteristic of behaving in a manner intended to produce morally good results. ”
Nowhere in there do I see anything about “desirable”—it seems to simplify down to “following a moral code”. I therefore suspect that you’re implicitly assuming a moral code which equates “desirable” with “good”—I don’t think that this is the best choice of a moral code, but it is a moral code that I’ve seen arguments in favour of before.
But, importantly, it’s not the only moral code. Someone who follows a different moral code can easily find something that is good but not desirable; or desirable but not good.
It’s not at all clear that morally good means desirable. The idea that the good is the desirable gets what force it has from the fact that “good” has a lot of nonmoral meanings. Good ice cream is desirable ice cream, but what’s that got to do with ethics?
Morally good means what it is good to do. So there is something added to “good” to get morally good—namely it is what it is good all things considered, and good to do, as opposed to good in other ways that have nothing to do with doing.
It if it would be good to eat ice cream at the moment, eating ice cream is morally good. And if it would be bad to eat ice cream at the moment, eating ice cream is morally bad.
But when you say “good ice cream,” you aren’t talking about what it is good to do, so you aren’t talking about morality. Sometimes it is good to eat bad ice cream (e.g. you have been offered it in a situation where it would be rude to refuse), and then it is morally good to eat the bad ice cream, and sometimes it is bad to eat good ice cream (e.g. you have already eaten too much), and then it is morally bad to eat the good ice cream.
That’s a theory of what “morally” is adding to “good”. You need to defend it against alternatives, rather than stating it as if it were obvious.
Are you sure? How many people agree with that? Do you have independent evidence , or are you just following through the consequences of your assumptions (ie arguing in circles)?
I think most people would say that it doesn’t matter if you eat ice cream or not, and in that sense they might say it is morally indifferent. However, while I agree that it mainly doesn’t matter, I think they are either identifying “non-morally obligatory” with indifferent here, or else taking something that doesn’t matter much, and speaking as though it doesn’t matter at all.
But I think that most people would agree that gluttony is a vice, and that implies that there is an opposite virtue, which would mean eating the right amount and at the right time and so on. And eating ice cream when it is good to eat ice cream would be an act of that virtue.
Would you agree that discussion about “morally good” is discussion about what we ought to do? It seems to me this is obviously what we are talking about. And we should do things that are good to do, and avoid doing things that are bad to do. So if “morally good” is about what we should do, then “morally good” means something it is good to do.
What is wrong with saying it doesn’t matter at all?
That’s pretty much changing the subject.
I think it is about what we morally ought to do. If you are playing chess, you ought to move the bishop diagonally, but that is again non-moral.
We morally-should do what is morally good, and hedonistically-should do what is hedonotsitcally-good, and so on. These can conflict, so they are not the same.
Talking about gluttony and temperance was not changing the subject. Most people think that morally good behavior is virtuous behavior, and morally bad behavior vicious behavior. So that implies that gluttony is morally bad, and temperance morally good. And if eating too much ice cream can be gluttony, then eating the right amount can be temperance, and so morally good.
There is a lot wrong with saying “it doesn’t matter at all”, but basically you would not bother with eating ice cream unless you had some reason for it, and any reason would contribute to making it a good thing to do.
I disagree completely with your statements about should, which do not correspond with any normal usage. No one talks about “hedonistically should.”
To reduce this to its fundamentals:
“I should do something” means the same thing as “I ought to do something”, which means the same thing as “I need to do something, in order to accomplish something else.”
Now if we can put whatever we want for “something else” at the end there, then you can have your “hedonistically should” or “chess playing should” or whatever.
But when we are talking about morality, that “something else” is “doing what is good to do.” So “what should I do?” has the answer “whatever you need to do, in order to be doing something good to do, rather than something bad to do.”
It’s changing the subject because you are switching from an isolated act to a pattern of behaviour.
Such as?
You are using good to mean morally good again.
You can’t infer the non-existence of a distinction from the fact that it is not regularly marked in ordinary language.
“Jade is an ornamental rock. The term jade is applied to two different metamorphic rocks that are composed of different silicate minerals:
So you say. Actually, the idea that ethical claims can be cashed out as hypotheticals is quite contentious.
Back to the usual problem. What you morally-should do is whatever you need to do, in order to be doing something morally good, is true but vacuous. . What you morally-should do is whatever you need to do, in order to be doing something good is debatable.
The point about the words is that it is easy to see from their origins that they are about hypothetical necessity. You NEED to do something. You MUST do it. You OUGHT to do it, that is you OWE it and you MUST pay your debt. All of that says that something has to happen, that is, that it is somehow necessary.
Now suppose you tell a murderer, “It is necessary for you to stop killing people.” He can simply say, “Necessary, is it?” and then kill you. Obviously it is not necessary, since he can do otherwise. So what did you mean by calling it necessary? You meant it was necessary for some hypothesis.
I agree that some people disagree with this. They are not listening to themselves talk.
The reason that moral good means doing something good, is that the hypothesis that we always care about, is whether it would be good to do something. That gives you a reason to say “it is necessary” without saying for what, because everyone wants to do something that would be good to do.
Suppose you define moral goodness to be something else. Then it might turn out that it would be morally bad to do something that would be good to do, and morally good to do something that would be bad to do. But in that case, who would say that we ought to do the thing which is morally good, instead of the thing that would be good to do? They would say we should do the thing that would be good to do, again precisely because it is necessary, and therefore we MUST do the supposedly morally bad thing, in order to be doing something good to do.
You are assuming that the only thing that counts as necessity per se is physical necessity, ie there is no physical possiibity of doing otherwise. But moral necessity is more naturally cashed out as the claim that there is no permissable state of affairs in which the murdered can murder.
http://www.hsu.edu/academicforum/2000-2001/2000-1AFThe%20Logic%20of%20Morality.pdf
In less abstract terms, what we are saying is that morality does not work like a common-or-garden in-order-to-achieve-X-do-Y. because you cannot excuse yourself , or obtain permissibility, simply by stating that you have some end in mind other than being moral. Even without logical necessity, morality has social obligatoriness, and that needs to be explained, and a vanilla account in terms of hypotetical necessities in order to achieve arbtrary ends cannot do that.
If the moral good were just a rubber-stamp of approval for whatever we have in our utility functions, there would be no need for morality as a behaviour-shaping factor in human society. Morality is not “do what thou wilt”.
In some sense of “good”, but, as usual, an unqualified “good” does not give you plausible morality.
It’s tautologous that we morally-should do what is morally-good.
The “no permissible state of affairs” idea is also hypothetical necessity: “you must do this, if we want a situation which we call permissible.”
As I think I have stated previously, the root of this disagreement is that you believe, like Eliezer, that reality is indifferent in itself. I do not believe that.
In particular, I said that good things tend to make us desire them. You said I had causality reversed there. But I did not: I had it exactly right. Consider survival, which is an obvious case of something good. Does the fact that we desire something, e.g. eating food instead of rocks, make it into something that makes us survive? Or rather, is the fact that it makes us survive the cause of the fact that we desire it? It is obvious from how evolution works that the latter is the case and not the former. So the fact that eating food is good is the cause of the fact that we desire it.
I said the basic moral question is whether it would be good to do something. You say that this is putting a “rubber-stamp of approval for whatever we have in our utility functions.” This is only the case, according to your misunderstanding of the relationship between desire and good. Good things tend to make us desire them. But just because there is a tendency, does not mean it always works out. Things tend to fall, but they don’t fall if someone catches them. And similarly good things tend to make us desire them, but once in a while that fails to work out and someone desires something bad instead. So saying “do whatever is good to do,” is indeed morality, but it definitely does not mean “do whatever thou wilt.”
I don’t care about “morally-should” as opposed to what I should do. I think I should do whatever would be good to do; and if that’s different from what you call moral, that’s too bad for you.
I still don’t think you have made a good case for morality being hypothetical, since you haven’t made a case against the case against. And I still think you need to explain obligatoriness.
Survival is good, you say. If I am in a position to ensure my survival by sacrificing Smith, is it morally good to do so? After all Smith’s survival is just as Good as mine.
Doens’t-care is made to care. If you don’t behave as though you care about morality, society will punish you. However. it won’t punish you for failing to fulfil other shoulds.
I didn’t see any good case against morality being hypothetical, not even in that article.
I did explain obligatoriness. It is obligatory to do something morally good because we don’t have a choice about wanting to do something good. Everyone wants to do that, and the only way you can do that is by doing something morally good.
I did said I do not care about morally-should “as opposed” to what I should do. It could sometimes happen that I should not do something because people will punish me if I do it. In other words, I do care about what I should do, and that is determined by what would be good to do.
From which it follows that nobody ever fails to do what is morally good, and that their inevitable moral goodness is th result of inner psychological compulsion, not outer systems of reward and punishment, and that no systems of reward and punishment systems were ever necessary. All of that is clearly false.
Unless there are non-moral gods, which there clearly are,since there are immoral and amoral acts committed to obtain them.
“From which it follows that nobody ever fails to do what is morally good”
No, it does not, unless you assume that people are never mistaken about what would be good to do. I already said that people are sometimes mistaken about this, and think that it would be good to do something, when it would be bad to do it. In those cases they fail to do what is morally good.
I agree there are non-moral goods, e.g. things like pleasure and money and so on. That is because a moral good is “doing something good”, and pleasure and money are not doing anything. But people who commit immoral acts in order to obtain those goods, also believe that they are doing something good, but they are mistaken.
Right. You said:
Do you have an objective set of criteria for differentiating between true foundational desires and other types of desires? If not, I wonder if it is really useful to respond to an objection arising from the rather obvious fact that people often have conflicting desires by stating that you doubt that true foundational desires are ever in precise conflict.
As CCC has already pointed out, no, it is not apparent that (morally) good and desirable are the same thing. I won’t spend more time on this point since CCC addressed it well.
The issue that we are discussing is objective morals. Your equating goodness and desirability leads (in my example of the sadist) A to believe that hurting B is good, and B to believe that hurting B is not good. But moral realism holds that moral valuations are statements that are objectively true or false. So, conflicting desires is not a red herring, since conflicting desires leads (using your criterion) to subjective moral evaluations regarding the goodness of hurting B. Game theory on the other hand does appear to be a red herring – no application of game theory can change the fact that A and B differ regarding the desirability of hurting B.
One additional problem with equating moral goodness with desirability is that it leads to moral outcomes that are in conflict with most people’s moral intuitions. For example, in my example of the sadist A desires to hurt B, but most people’s moral intuition would say that A hurting B just because A wants to hurt B would be immoral. Similarly, rape, murder, theft, etc., could be considered morally good by your criterion if any of those things satisfied a desire. While conflicting with moral intuition does not prove that your definition is wrong, it seems to me that it should at a minimum raise a red flag. And, I think that the burden is on you to explain why anyone should reject his/her moral intuition in favor of a moral criterion that would adjudge theft, rape and murder to be morally good if they satisfy a true desire.
You need to refute non-cognitivism, as well as asserting naturalism.
Naturalism says that all questions that have answer have naturalistic answers, which means that if there are answers to ethical questions, they are naturalistic answers. But there is no guarantee that ethical questions mean anything, that they have answers.
No, only non-cogntivism, the idea that ethical questions just don’t make sense, like “how many beans make yellow?”.
Not unless the “F” is standing for something weird. Absent objective morality, you can possibly solve the control problem, ie achieving safety by just making the AI do what you want; and absent objective morality, you can possibly achieve AI safety by instilling a suitable set of arbitrary values. Neither is easy, but you said “impossible”.
That’s not an argument for cognitivism. When I entertain the thought “how many beans make yellow?”, that’s an arrangement of particles.
Do you have an argument for that proposal? Because I am arguing for something much simpler, that morality only needs to be grounded at the human level, so reductionism is neither denied nor employed.
It’s hard to see what point you are making there. The social and evaluative aspects do make a difference to the raw physics, and so much that the raw physics counts for very little. yet previously you were insisting that a reduction to fundamental particles was what underpinned the objectivity of morality.
Yes; good points! Do note that my original comment was made eight years ago! (At least – it was probably migrated from Overcoming Bias if this post is as early as it seems to be.)
So I have had some time to think along these lines a little more :)
But I don’t think intelligence itself can lead one to conclude as you have:
It’s not obvious to me now that any particular distinction will be made by any particular intelligence. There’s maybe not literally infinite, but still a VAST number of possible ontologies with which to make distinctions. The general class of ‘intelligent systems’ is almost certainly WAY more alien than we can reasonably imagine. I don’t assume that even a ‘super-intelligence’ would definitely ever “differentiate between smiley-faces and happy people”.
But I don’t remember this post that well, and I was going to re-read before I remembered that I didn’t even know what I was originally replying to (as it didn’t seem to be the post itself), and re-constructing the entire context to write a better reply which my temporal margin “is too narrow to contain” at the moment.
But I think I still disagree with whatever Shane wrote!