Rob Bensinger comments on Magical Categories

Rob Bensinger Jan 16, 2014, 9:05 PM
1 point
0

Assuming morality is lots of highly localised, different things...which I don’t , particularly.

The problem of FAI is the problem of figuring out all of humanity’s deepest concerns and preferences, not just the problem of figuring out the ‘moral’ ones (whichever those are). E.g., we want a superintelligence to not make life boring for everyone forever, even if ‘don’t bore people’ isn’t a moral imperative.

Regardless, I don’t see how the moral subset of human concerns could be simplified without sacrificing most human intuitions about what’s right and wrong. Human intuitions as they stand aren’t even consistent, so I don’t understand how you can think the problem of making them consistent and actionable is going to be a simple one.

if it is not, then you can figure it out anywhere,

Someday, perhaps. With enough time and effort invested. Still, again, we would expect a lot more human-intelligence-level aliens (even if those aliens knew a lot about human behavior) to be good at building better AIs than to be good at formalizing human value. For the same reason, we should expect a lot more possible AIs we could build to be good at building better AIs than to be good at formalizing human value.

If it is,then the problem the aliens have is not that morality is imponderable

I don’t know what you mean by ‘imponderable’. Morality isn’t ineffable; it’s just way too complicated for us to figure out. We know how things are on Earth; we’ve been gathering data and theorizing about morality for centuries. And our progress in formalizing morality has been minimal.

An averagely intelligent AI with an average grasp of morlaity would not be more of a threat than an average human.

An AI that’s just a copy of a human running on transistors is much more powerful than a human, because it can think and act much faster.

A smart AI, would, all other things being equal, be better at figuring out moralitry.

It would also be better at figuring out how many atoms are in my fingernail, but that doesn’t mean it will ever get an exact count. The question is how rough an approximation of human value can we allow before all value is lost; this is the ‘fragility of values’ problem. It’s not enough for an AGI to do better than us at FAI; it has to be smart enough to solve the problem to a high level of confidence and precision.

But why should moral concepts be som much more difficult than others?

First, because they’re anthropocentric; ‘iron’ can be defined simply because it’s a common pattern in Nature, not a rare high-level product of a highly contingent and complex evolutionary history. Second, because they’re very inclusive; ‘what humans care about’ or ‘what humans think is Right’ is inclusive of many different human emotions, intuitions, cultural conventions, and historical accidents.

But the main point is just that human value is difficult, not that it’s the most difficult thing we could do. If other tasks are also difficult, that doesn’t necessarily make FAI easier.

An AI smart enought to talk its way out of a box would be able to understand the implicit complexity: an AI too dumb to understand implicit complexity would be boxable. Where is the problem?

You’re forgetting the ‘seed is not the superintelligence’ lesson from The genie knows, but doesn’t care. If you haven’t read that article, go do so. The seed AI is dumb enough to be boxable, but also too dumb to plausibly solve the entire FAI problem itself. The superintelligent AI is smart enough to solve FAI, but also too smart to be safely boxed; and it doesn’t help us that an unFriendly superintelligent AI has solved FAI, if by that point it’s too powerful for us to control. You can’t safely pass the buck to a superintelligence to tell us how to build a superintelligence safe enough to pass bucks to.

Things are not inherently dagerous just because they are unpredictable. If you have some independent reason fo thinking something might turn dangerous, then it becomes desirable to predict it.

Yes. The five theses give us reason to expect superintelligent AI to be dangerous by default. Adding more unpredictability to a system that already seems dangerous will generally make it more dangerous.

they are not assumed to develop mysterious blind spots about falconry or mining engineering, Why assume they will develop a blind spot about morality?

‘The genie knows, but doesn’t care’ means that the genie (i.e., superintelligence) knows how to do human morality (or could easily figure it out, if it felt like trying), but hasn’t been built to care about human morality. Knowing how to behave the way humans want you to is not sufficient for actually behaving that way; Eliezer makes that point well in No Universally Compelling Arguments.

The worry isn’t that the superintelligence will be dumb about morality; it’s that it will be indifferent to morality, and that by the time it exists it will be too late to safely change that indifference. The seed AI (which is not a superintelligence, but is smart enough to set off a chain of self-modifications that lead to a superintelligence) is dumb about morality (approximately as dumb as humans are, if not dumber), and is also probably not a particularly amazing falconer or miner. It only needs to be a competent programmer, to qualify as a seed AI.

The average person manages to solve the problem of being moral themselves, in a good-enough way.

Good enough for going to the grocery store without knifing anyone. Probably not good enough for safely ruling the world. With greater power comes a greater need for moral insight, and a greater risk should that insight be absent.

Why isn’t havign a formalisation of morality a prolem with humans?

It is a problem, and it leads to a huge amount of human suffering. It doesn’t mean we get everything wrong, but we do make moral errors on a routine basis; the consequences are mostly non-catastrophic because we’re slow, weak, and have adopted some ‘good-enough’ heuristics for bounded circumstances.

We know how humans incremently improve as moral reasoners: it’s called the Kohlberg hierarchy.

Just about every contemporary moral psychologist I’ve read or talked to seems to think that Kohlberg’s overall model is false. (Though some may think it’s a useful toy model, and it certainly was hugely influential in its day.) Haidt’s The Emotional Dog and Its Rational Tail gets cited a lot in this context.

We do have morality tests. Fail them and you get pilloried in the media or sent to jail.

That’s certainly not good enough. Build a superintelligence that optimizes for ‘following the letter of the law’ and you don’t get a superintelligence that cares about humans’ deepest values. The law itself has enough inexactness and arbitrariness that it causes massive needless human suffering on a routine basis, though it’s another one of those ‘good-enough’ measures we keep in place to stave off even worse descents into darkness.

If it works like arithmetic, that is if it is an expansion of some basic principles

Human values are an evolutionary hack resulting from adaptations to billions of different selective pressures over billions of years, innumerable side-effects of those adaptations, genetic drift, etc. Arithmetic can be formalized in a few sentences. Why think that humanity’s deepest preferences are anything like that simple? Our priors should be very low for ‘human value is simple’ just given the etiology of human value, and our failure to converge on any simple predictive or normative theory thus far seems to only confirm this.
- BrianPansky Nov 12, 2016, 4:01 AM
  0 points
  0
  Parent
  
  The problem of FAI is the problem of figuring out all of humanity’s deepest concerns and preferences, not just the problem of figuring out the ‘moral’ ones (whichever those are)
  
  Those two things turn out to be identical (deepest concerns and preferences=the ‘moral’ ones). Because nothing else can be of greater importance to a decision maker.
- TheAncientGeek Jan 23, 2014, 5:29 PM
  −6 points
  0
  Parent
  PART 2
  
  An AI smart enought to talk its way out of a box would be able to understand the implicit complexity: an AI too dumb to understand implicit complexity would be boxable. Where is the problem?
  
  You’re forgetting the ‘seed is not the superintelligence’ lesson from The genie knows, but doesn’t care. If you haven’t read that article, go do so. The seed AI is dumb enough to be boxable, but also too dumb to plausibly solve the entire FAI problem itself.
  
  I am arguing that it would not have to solve AI itself.
  
  The superintelligent AI is smart enough to solve FAI, but also too smart to be safely boxed;
  
  Huh? If it is moral and alien friendly , why would you need to box it?
  
  and it doesn’t help us that an unFriendly superintelligent AI has solved FAI, if by that point it’s too powerful for us to control. You can’t safely pass the buck to a superintelligence to tell us how to build a superintelligence safe enough to pass bucks to.
  
  If it’s friendly, why enslave it?
  
  Things are not inherently dagerous just because they are unpredictable. If you have some independent reason fo thinking something might turn dangerous, then it becomes desirable to predict it.
  
  Yes. The five theses give us reason to expect superintelligent AI to be dangerous by default. Adding more unpredictability to a system that already seems dangerous will generally make it more dangerous.
  
  The five theses are variously irrelevant and misapplied. Details supplied on request.
  
  they are not assumed to develop mysterious blind spots about falconry or mining engineering, Why assume they will develop a blind spot about morality?
  
  ‘>The genie knows, but doesn’t care’ means that the genie (i.e., superintelligence) knows how to do human morality (or could easily figure it out, if it felt like trying), but hasn’t been built to care about human morality.
  
  What genie? Who built it that way? If your policy is to build an artificial philosopher, an AI that can solve morality is itself, why would you build it to not act on what it knows?
  
  Knowing how to behave the way humans want you to is not sufficient for actually behaving that way; Eliezer makes that point well in No Universally Compelling Arguments.
  
  No, his argument is irrelevant as explained in this comment.
  
  The worry isn’t that the superintelligence will be dumb about morality; it’s that it will be indifferent to morality,
  
  You don’t have to pre-programme the whole of friendliness or morality to fix that. If you have reason to suspect that there are no intrinsically compelling concepts, then you can build an AI that wants to be moral, but needs to figure otu what that is.
  
  and that by the time it exists it will be too late to safely change that indifference. The seed AI (which is not a superintelligence, but is smart enough to set off a chain of self-modifications that lead to a superintelligence) is dumb about morality (approximately as dumb as humans are, if not dumber), and is also probably not a particularly amazing falconer or miner. It only needs to be a competent programmer, to qualify as a seed AI.
  
  Which is only a problem if you assume, as I don’t, that it will be pre-programming a fixed morality.
  
  The average person manages to solve the problem of being moral themselves, in a good-enough way.
  
  Good enough for going to the grocery store without knifing anyone. Probably not good enough for safely ruling the world. With greater power comes a greater need for moral insight, and a greater risk should that insight be absent.
  
  With greater intelligence comes greater moral insight—unless you create a problem by walling off that part of an AI.
  
  Why isn’t havign a formalisation of morality a prolem with humans?
  
  It is a problem, and it leads to a huge amount of human suffering. It doesn’t mean we get everything wrong, but we do make moral errors on a routine basis; the consequences are mostly non-catastrophic because we’re slow, weak, and have adopted some ‘good-enough’ heuristics for bounded circumstances.
  
  OK. The consequences are non catastrophic. An AI with imperfect, good-enough morality would not be an existential threat.
  
  We know how humans incremently improve as moral reasoners: it’s called the Kohlberg hierarchy.
  
  Just about every contemporary moral psychologist I’ve read or talked to seems to think that Kohlberg’s overall model is false. (Though some may think it’s a useful toy model, and it certainly was hugely influential in its day.) Haidt’s The Emotional Dog and Its Rational Tail gets cited a lot in this context.
  
  And does Haidt’s work mean that everyone is one par, morally? Does it mean that no one can progress in moral insight?
  
  We do have morality tests. Fail them and you get pilloried in the media or sent to jail.
  
  That’s certainly not good enough. Build a superintelligence that optimizes for ‘following the letter of the law’ and you don’t get a superintelligence that cares about humans’ deepest values.
  
  It isn’t good enough for a ceiling: it is good enough for a floor.
  
  If it works like arithmetic, that is if it is an expansion of some basic principles
  
  Human values are an evolutionary hack resulting from adaptations to billions of different selective pressures over billions of years, innumerable side-effects of those adaptations, genetic drift, etc.
  
  De facto ones are, yes. Likewise folk physics is an evolutionary hack. But if we build an AI to do physics, we don’t intend it to do folk physics, we intend it to do physics.
  
  Arithmetic can be formalized in a few sentences. Why think that humanity’s deepest preferences are anything like that simple?
  
  There’s a theory of morality that can be expressed in a few sentences, and leaves preferences as variables to be filled in later. It’s called utilitarianism.
  
  Our priors should be very low for ‘human value is simple’ just given the etiology of human value, and our failure to converge on any simple predictive or normative theory thus far seems to only confirm this.
  
  So? If value is complex, that doesn’t affect utilitarianism, for instance. You,and other lesswrongian writers, keep behaving as though “values are X” is obviously equivalent to “morality is X”.
  - Rob Bensinger Jan 24, 2014, 12:45 AM
    0 points
    0
    Parent
    
    “The superintelligent AI is smart enough to solve FAI, but also too smart to be safely boxed;”
    
    Huh? If it is moral and alien friendly , why would you need to box it?
    
    You’re confusing ‘smart enough to solve FAI’ with ‘actually solved FAI’, and you’re confusing ‘actually solved FAI’ with ‘self-modified to become Friendly’. Most possible artificial superintelligences have no desire to invest much time into figuring out human value, and most possible ones that do figure out human value have no desire to replace their own desires with the desires of humans. If the genie knows how to build a Friendly AI, that doesn’t imply that the genie is Friendly; so superintelligence doesn’t in any way imply Friendliness even if it implies the ability to become Friendly.
    
    No, his argument is irrelevant as explained in this comment.
    
    Why does that comment make his point irrelevant? Are you claiming that it’s easy to program superintelligences to be ‘rational″, where ‘rationality’ doesn’t mean instrumental or epistemic rationality but instead means something that involves being a moral paragon? It just looks to me like black-boxing human morality to make it look simpler or more universal.
    
    If you have reason to suspect that there are no intrinsically compelling concepts, then you can build an AI that wants to be moral, but needs to figure otu what that is.
    
    And how do you code that? If the programmers don’t know what ‘be moral’ means, then how do they code the AI to want to ‘be moral’? See Truly Part Of You.
    
    An AI with imperfect, good-enough morality would not be an existential threat.
    
    A human with superintelligence-level superpowers would be an existential threat. An artificial intelligence with superintelligence-level superpowers would therefore also be an existential threat, if it were merely as ethical as a human. If your bar is set low enough to cause an extinction event, you should probably raise your bar a bit.
    
    And does Haidt’s work mean that everyone is one par, morally? Does it mean that no one can progress in moral insight?
    
    No. Read Haidt’s paper, and beware of goalpost drift.
    
    It isn’t good enough for a ceiling: it is good enough for a floor.
    
    No. Human law isn’t built for superintelligences, so it doesn’t put special effort into blocking loopholes that would be available to an ASI. E.g., there’s no law against disassembling the Sun, because no lawmaker anticipated that anyone would have that capability.
    
    There’s a theory of morality that can be expressed in a few sentences, and leaves preferences as variables to be filled in later. It’s called utilitarianism.
    
    … Which isn’t computable, and provides no particular method for figuring out what the variables are. ‘Preferences’ isn’t operationalized.
    
    You,and other lesswrongian writers, keep behaving as though “values are X” is obviously equivalent to “morality is X”.
    
    Values in general are what matters for Friendly AI, not moral values. Moral values are a proper subset of what’s important and worth protecting in humanity.
- TheAncientGeek Jan 23, 2014, 5:18 PM
  −8 points
  0
  Parent
  PART 1 of 2
  
  The problem of FAI is the problem of figuring out all of humanity’s deepest concerns and preferences, not just the problem of figuring out the ‘moral’ ones (whichever those are). E.g., we want a superintelligence to not make life boring for everyone forever, even if ‘don’t bore people’ isn’t a moral imperative.
  
  The AI might need a lot of localised information for friendliness, but it needn’t be preprogrammed.
  
  You have assumed that friendliness is a superset of morality. Assume also that an AI is capable of being moral.
  
  Then. to have a more fun existence, all you have to do is ask it questions, like “How can we build hovering skateboards”. What failure modes could that lead to? If it doesn’t know what things humans enjoy, it can research the subject..humans can entertain their pets after all. It would have no reason to refuse to answer questions, unless the answer was dangerous to human well being (ie what humans assume is harmless fun actually isn’t). But that isn’t actually a failure. it’s a safety feature.
  
  Regardless, I don’t see how the moral subset of human concerns could be simplified without sacrificing most human intuitions about what’s right and wrong.
  
  That’s like saying you can’t simply folk physics down to real physics without sacrificing a lot of intuitions. Intuitions that are wrong need to go.
  
  Human intuitions as they stand aren’t even consistent, so I don’t understand how you can think the problem of making them consistent and actionable is going to be a simple one.
  
  I didn’t say it was simple. I want the SAI to do it for itself. I don’t think the alternative, of solving friendliness—which is more than morality—and preprogramming it is simple.
  
  Someday, perhaps. With enough time and effort invested. Still, again, we would expect a lot more human-intelligence-level aliens (even if those aliens knew a lot about human behavior) to be good at building better AIs than to be good at formalizing human value. For the same reason, we should expect a lot more possible AIs we could build to be good at building better AIs than to be good at formalizing human value.
  
  So what’s the critical difference between understanding value and understanding (eg) language? I think the asymmetry has come in where you assume that “value” has to be understood as a rag bag of attitudes and opinion. Everyone assumes that understanding physics means understanding the kind of physics found in textbooks, and that understanding language need only go as far as understanding a cleaned-up official version, and not a superposition of every possible dialect and idiolect. Morality/value looks difficult to you because you are taking it to be the incoherent mess you would get by throwing in everyone’s attitudes and beliefs into a pot indiscriminately. But many problems would be insoluble under that assumption.
  
  If it is,then the problem the aliens have is not that morality is imponderable
  
  I don’t know what you mean by ‘imponderable’. Morality isn’t ineffable; it’s just way too complicated for us to figure out.
  
  If you assume that all intuitions have to be taken into account, even conflicting ones, then it’s nto just difficult, it’s impossible. But I don;’t assume that.
  
  We know how things are on Earth; we’ve been gathering data and theorizing about morality for centuries. And our progress in formalizing morality has been minimal.
  
  Yet the average person is averagely moral. People, presumably, are not running on formalisation. If you assume that an AI has to be preprogrammed with morality, then you can conclude that an AI will need the formalisation we don’t have. If you assume that an AI is a learning system, then it can learn and does not need to be preprogrammed.
  
  An averagely intelligent AI with an average grasp of morlaity would not be more of a threat than an average human.
  
  An AI that’s just a copy of a human running on transistors is much more powerful than a human, because it can think and act much faster.
  
  If you speed up a chicken brain by a million, what do you get?
  
  A smart AI, would, all other things being equal, be better at figuring out moralitry.
  
  It would also be better at figuring out how many atoms are in my fingernail, but that doesn’t mean it will ever get an exact count. The question is how rough an approximation of human value can we allow before all value is lost; this is the ‘fragility of values’ problem.
  
  There is no good reason to think that “morality” and “values” are synonyms.
  
  It’s not enough for an AGI to do better than us at FAI; it has to be smart enough to solve the problem to a high level of confidence and precision.
  
  If it’s too dumb to solve it, it’s too dumb to be a menace; if it’s smart enough to be a menace, it’s smart enought to solve it.
  
  But why should moral concepts be som much more difficult than others?
  
  First, because they’re anthropocentric;
  
  Are they? Animals can be morally relevant to humans. Human can be morally relevant to aliens. Aliens cam be morally relevant to each other .
  
  ‘iron’ can be defined simply because it’s a common pattern in Nature, not a rare high-level product of a highly contingent and complex evolutionary history.
  
  Anthropic and Universal aren’t the only options. Alien morality is a coherent concept, like alien art and alien economics.
  
  Second, because they’re very inclusive; ‘what humans care about’ or ‘what humans think is Right’ is inclusive of many different human emotions, intuitions, cultural conventions, and historical accidents.
  
  Morality is about what is right, not what is believe to be. Physics is not folk physics or history of physics.
  - Rob Bensinger Jan 24, 2014, 12:24 AM
    0 points
    0
    Parent
    
    The AI might need a lot of localised information for friendliness, but it needn’t be preprogrammed.
    
    I don’t know what you mean by ‘preprogrammed’, and I don’t know what view you think you’re criticizing by making this point. MIRI generally supports indirect normativity, not direct normativity.
    
    You have assumed that friendliness is a superset of morality.
    
    A Friendly AI is, minimally, a situation-generally safe AGI. By the intelligence explosion thesis, ‘situation-generally’ will need to encompass the situation in which an AGI self-modifies to an ASI (artificial superintelligence), and since ASI are much more useful and dangerous than human-level AGIs, the bulk of the work in safety-proofing AGI will probably go into safety-proofing ASI.
    
    A less minimal definition will say that Friendly AIs are AGIs that bring about situations humans strongly desire/value, and don’t bring about situations they strongly dislike/disvalue. One could also treat this as an empirical claim about the more minimal definition: Any adequately safe AGI will be extremely domain-generally useful.
    
    Regardless, nowhere in the above two paragraphs did I talk specifically about morality. Moral values are important, but they are indeed a proper subset of human values, and we don’t want an AGI to make everything worse forever even if it finds a way to do so without doing anything ‘immoral’.
    
    Assume also that an AI is capable of being moral.
    
    No one has assumed otherwise. The problem isn’t that Friendly AI is impossible; it’s that most ASIs aren’t Friendly, and unFriendly ASIs seem to be easier to build (because they’re a more generic class).
    
    That’s like saying you can’t simply folk physics down to real physics without sacrificing a lot of intuitions. Intuitions that are wrong need to go.
    
    Getting rid of wrong intuitions may well make morality more complicated, rather than less. We agree that human folk morality may need to be refined a lot, but that gives us no reason to expect the task to be easy or the end-product to be simple. Physical law appears to be simple, but it begets high-level regularities that are much less simple, like brains, genomes, and species. Morality occurs at a level closer to brains, genomes, and species than to physical law.
    
    But many problems would be insoluble under that assumption.
    
    If human civilization depended on building an AI that can domain-generally speak English in a way that we’d ideally recognize as Correct, then I would be extremely worried. We can get away with shortcuts and approximations because speaking English correctly isn’t very important. But getting small things permanently wrong about human values is important, when you’re in control of the future of humanity.
    
    It might not look fair that humans have to deal with such a huge problem, but the universe doesn’t always give people reasonable-sized challenges.
    
    If you assume that an AI is a learning system, then it can learn and does not need to be preprogrammed.
    
    It has to be preprogrammed to learn the right things, and to incorporate the right things it’s learned into its preferences. Saying ‘Just program the AI to learn the right preferences’ doesn’t solve the problem; programming the AI to learn the right preferences is the problem. See Detached lever fallacy:
    
    “All this goes to explain why you can’t create a kindly Artificial Intelligence by giving it nice parents and a kindly (yet occasionally strict) upbringing, the way it works with a human baby. As I’ve often heard proposed.
    
    “It is a truism in evolutionary biology that conditional responses require more genetic complexity than unconditional responses. To develop a fur coat in response to cold weather requires more genetic complexity than developing a fur coat whether or not there is cold weather, because in the former case you also have to develop cold-weather sensors and wire them up to the fur coat.
    
    “But this can lead to Lamarckian delusions: Look, I put the organism in a cold environment, and poof, it develops a fur coat! Genes? What genes? It’s the cold that does it, obviously.
    
    “There were, in fact, various slap-fights of this sort, in the history of evolutionary biology—cases where someone talked about an organismal response accelerating or bypassing evolution, without realizing that the conditional response was a complex adaptation of higher order than the actual response. (Developing a fur coat in response to cold weather, is strictly more complex than the final response, developing the fur coat.) [...]
    
    “But the upshot is that if you have a little baby AI that is raised with loving and kindly (but occasionally strict) parents, you’re pulling the levers that would, in a human, activate genetic machinery built in by millions of years of natural selection, and possibly produce a proper little human child. Though personality also plays a role, as billions of parents have found out in their due times.
    
    “It’s easier to program in unconditional niceness, than a response of niceness conditional on the AI being raised by kindly but strict parents. If you don’t know how to do that, you certainly don’t know how to create an AI that will conditionally respond to an environment of loving parents by growing up into a kindly superintelligence. If you have something that just maximizes the number of paperclips in its future light cone, and you raise it with loving parents, it’s still going to come out as a paperclip maximizer. There is not that within it that would call forth the conditional response of a human child. Kindness is not sneezed into an AI by miraculous contagion from its programmers. Even if you wanted a conditional response, that conditionality is a fact you would have to deliberately choose about the design.
    
    “Yes, there’s certain information you have to get from the environment—but it’s not sneezed in, it’s not imprinted, it’s not absorbed by magical contagion. Structuring that conditional response to the environment, so that the AI ends up in the desired state, is itself the major problem. ‘Learning’ far understates the difficulty of it—that sounds like the magic stuff is in the environment, and the difficulty is getting the magic stuff inside the AI. The real magic is in that structured, conditional response we trivialize as ‘learning’. That’s why building an AI isn’t as easy as taking a computer, giving it a little baby body and trying to raise it in a human family. You would think that an unprogrammed computer, being ignorant, would be ready to learn; but the blank slate is a chimera.”
    - TheAncientGeek May 7, 2014, 6:59 PM
      −2 points
      0
      Parent
      
      The AI might need a lot of localised information for friendliness, but it needn’t be preprogrammed.
      
      I don’t know what you mean by ‘preprogrammed’,
      
      I mean the proposal to solve morality and code it in to an AI.
      
      and I don’t know what view you think you’re criticizing by making this point. MIRI generally supports indirect normativity, not direct normativity.
      
      You have assumed that friendliness is a superset of morality.
      
      A Friendly AI is, minimally, a situation-generally safe AGI.
      
      Whrich is to say that full Fat friendliness is a superset of minimal friendliness . But minimal friendliness is just what I have been calling morality, and I dont see why I shouldn’t continue. So friendliness is a superset of morality, as I said.
      
      By the intelligence explosion thesis, ‘situation-generally’ will need to encompass the situation in which an AGI self-modifies to an ASI (artificial superintelligence), and since ASI are much more useful and dangerous than human-level AGIs, the bulk of the work in safety-proofing AGI will probably go into safety-proofing ASI.
      
      .....by your assumptions, that morality/friendliness needs to say be solved separately from intelligence. But that is just what I am disputing.
      
      A less minimal definition will say that Friendly AIs are AGIs that bring about situations humans strongly desire/value, and don’t bring about situations they strongly dislike/disvalue. One could also treat this as an empirical claim about the more minimal definition: Any adequately safe AGI will be extremely domain-generally useful.
      
      An AGI can be useful without wanting to do anything but answer questions accurately.
      
      Regardless, nowhere in the above two paragraphs did I talk specifically about morality.
      
      You didn’t do use the word. But I think “not doing bad things, whilst not necessarily doing fun things either” picks out the same referent.
      
      Moral values are important, but they are indeed a proper subset of human values, and we don’t want an AGI to make everything worse forever even if it finds a way to do so without doing anything ‘immoral’.
      
      I find it hard to interpret that statement. How can making things worse forever not be immoral? What non-moral definition of worse are you mean using ?
      
      That’s like saying you can’t simply folk physics down to real physics without sacrificing a lot of intuitions. Intuitions that are wrong need to go.
      
      Getting rid of wrong intuitions may well make morality more complicated, rather than less. We agree that human folk morality may need to be refined a lot, but that gives us no reason to expect the task to be easy or the end-product to be simple
      
      We have very good reason to think that the one true theory of something will be simpler, in Kolmogorov terms, than a mishmash of everybody’s guesses Physics is simpler than folk physics. (It is harder to learn, because that requires the effortful system II to engage...but effort and complexity are different things).
      
      And remember , my assumption is that the AI works out morality itself.
      
      . Physical law appears to be simple, but it begets high-level regularities that are much less simple, like brains, genomes, and species. Morality occurs at a level closer to brains, genomes, and species than to physical law.
      
      If an ASI can figure out such high level subjects as biology and decision theory, why shouldn’t it be useful able to figure out morality?
      
      But many problems would be insoluble under that assumption.
      
      If human civilization depended on building an AI that can domain-generally speak English in a way that we’d ideally recognize as Correct, then I would be extremely worried. We can get away with shortcuts and approximations because speaking English correctly isn’t very important. But getting small things permanently wrong about human values is important, when you’re in control of the future of humanity.
      
      Why wouldn’t an AI that is smarter than US not be able to realise that for itself ?
      
      It might not look fair that humans have to deal with such a huge problem, but the universe doesn’t always give people reasonable-sized challenges.
      
      If you assume that an AI is a learning system, then it can learn and does not need to be preprogrammed.
      
      It has to be preprogrammed to learn the right things, and to incorporate the right things it’s learned into its preferences.
      
      That is confusingly phrased. A learning system needs some basis to learn, granted. You assume, tacitly, that it need not be preprogrammed with the right rules of grammar or economics. Why make exception for ethics?
      
      Saying ‘Just program the AI to learn the right preferences’ doesn’t solve the problem; programming the AI to learn the right preferences is the problem. See Detached lever fallacy:
      
      A learning system needs some basis other than external stimulus to learn: given that, it is quite possible for most of the information to be contained in the stimulus, the data. Consider language. Do you think an AI will have to be preprogrammed with all the contents of every dictionary
      
      “All this goes to explain why you can’t create a kindly Artificial Intelligence by giving it nice parents and a kindly (yet occasionally strict) upbringing, the way it works with a human baby. As I’ve often heard proposed.
      
      “It is a truism in evolutionary biology that conditional responses require more genetic complexity than unconditional responses. To develop a fur coat in response to cold weather requires more genetic complexity than developing a fur coat whether or not there is cold weather, because in the former case you also have to develop cold-weather sensors and wire them up to the fur coat.
      
      “But this can lead to Lamarckian delusions: Look, I put the organism in a cold environment, and poof, it develops a fur coat! Genes? What genes? It’s the cold that does it, obviously.
      
      “There were, in fact, various slap-fights of this sort, in the history of evolutionary biology—cases where someone talked about an organismal response accelerating or bypassing evolution, without realizing that the conditional response was a complex adaptation of higher order than the actual response. (Developing a fur coat in response to cold weather, is strictly more complex than the final response, developing the fur coat.) [...]
      
      “But the upshot is that if you have a little baby AI that is raised with loving and kindly (but occasionally strict) parents, you’re pulling the levers that would, in a human, activate genetic machinery built in by millions of years of natural selection, and possibly produce a proper little human child. Though personality also plays a role, as billions of parents have found out in their due times.
      
      “It’s easier to program in unconditional niceness, than a response of niceness conditional on the AI being raised by kindly but strict parents. If you don’t know how to do that, you certainly don’t know how to create an AI that will conditionally respond to an environment of loving parents by growing up into a kindly superintelligence. If you have something that just maximizes the number of paperclips in its future light cone, and you raise it with loving parents, it’s still going to come out as a paperclip maximizer. There is not that within it that would call forth the conditional response of a human child. Kindness is not sneezed into an AI by miraculous contagion from its programmers. Even if you wanted a conditional response, that conditionality is a fact you would have to deliberately choose about the design.
      
      “Yes, there’s certain information you have to get from the environment—but it’s not sneezed in, it’s not imprinted, it’s not absorbed by magical contagion. Structuring that conditional response to the environment, so that the AI ends up in the desired state, is itself the major problem. ‘Learning’ far understates the difficulty of it—that sounds like the magic stuff is in the environment, and the difficulty is getting the magic stuff inside the AI. The real magic is in that structured, conditional response we trivialize as ‘learning’. That’s why building an AI isn’t as easy as taking a computer, giving it a little baby body and trying to raise it in a human family. You would think that an unprogrammed computer, being ignorant, would be ready to learn; but the blank slate is a chimera.”