Juno_Watt comments on The genie knows, but doesn’t care

Juno_Watt 12 Sep 2013 14:14 UTC
−4 points
A. Solve the Problem of Meaning-in-General in advance, and program it to follow our instructions’ real meaning. Then just instruct it ‘Satisfy my preferences’, and wait for it to become smart enough to figure out my preferences.

That problem has got to be solved somehow at some stage, because something that couldn’t pass a Turing Test is no AGI.
But there are a host of problems with treating the mere revelation that A is an option as a solution to the Friendliness problem.
1. You have to actually code the seed AI to understand what we mean. Y
Why is that a problem? Is anyone suggesting AGI can be had for free?
1. The Problem of Meaning-in-General may really be ten thousand heterogeneous problems, especially if ‘semantic value’ isn’t a natural kind. There may not be a single simple algorithm that inputs any old brain-state and outputs what, if anything, it ‘means’; it may instead be that different types of content are encoded very differently.
Ok. NL is hard. Everyone knows that. But its got to be solved anyway.

3… On the face of it, programming an AI to fully understand ‘Be Friendly!’ seems at least as difficult as just programming Friendliness into it, but with an added layer of indirection.

Yeah, but it’s got to be done anyway.

[more of the same snipped]

It’s clear that building stable preferences out of B or C would create a Friendly AI.

Yeah. But it wouldn’t be an AGI or an SI if it couldn’t pass a TT.

The genie — if it bothers to even consider the question — should be able to understand what you mean by ‘I wish for my values to be fulfilled.’ Indeed, it should understand your meaning better than you do. But superintelligence only implies that the genie’s map can compass your true values. Superintelligence doesn’t imply that the genie’s utility function has terminal values pinned to your True Values, or to the True Meaning of your commands.

The issue of whether the SI’s UF contains a set of human values is irrelevant. In a Loosemore architecture, an AI needs to understand and follow the directive “be friendly to humans”, and those are all the goals it needs—to understand, and to follow;

When you write the seed’s utility function, you, the programmer, don’t understand everything about the nature of human value or meaning. That imperfect understanding remains the causal basis of the fully-grown superintelligence’s actions, long after it’s become smart enough to fully understand our values.

The UF only needs to contain “understand English, and obey this directive”. You don’t have to code semantics into the UF. You do of course, have to code it in somewhere,

Instead, we have to give it criteria we think are good indicators of Friendliness, so it’ll know what to self-modify toward

A problem which has been solved over and over by humans. Humans don’t need to be loaded apriori with what makes other humans happy, they only need to know general indicators, like smiles and statements of approval.

Yes, the UFAI will be able to solve Friendliness Theory. But if we haven’t already solved it on our own power, we can’t pinpoint Friendliness in advance, out of the space of utility functions. And if we can’t pinpoint it with enough detail to draw a road map to it and it alone, we can’t program the AI to care about conforming itself with that particular idiosyncratic algorithm.

Why would that be necessary? In the Loosemore architecture, the AGI has the goals of understanding English and obeying the Be Friendly directive. It eventually gets a detailed, extensional, understanding of Friendliness from pursuing those goals, Why would it need to be preloaded with a detailed, extensional unpacking of friendliness? It could fail in understanding English, of course. But there is no reason to think it is unlikely to fail at understanding “friendliness” specifically, and its competence can be tested as you go along.

And if we can’t pinpoint it with enough detail to draw a road map to it and it alone, we can’t program the AI to care about conforming itself with that particular idiosyncratic algorithm.

I don’t see the problem. In the Loosemore architecture, the AGI will care about obeying “be friendly”, and it will arrive at the detailed expansion, the idiosyncracies, of “friendly” as part of its other goal to understand English. It cares about being friendly, and it knows the detailed expansion of friendliness, so where’s the problem?

Yes, the UFAI will be able to self-modify to become Friendly, if it so wishes. But if there is no seed of Friendliness already at the heart of the AI’s decision criteria, no argument or discovery will spontaneously change its heart.

Says who? It has the high level directive, and another directive to understand the directive. It’s been Friendly in principle all along, it just needs to fill in the details.

Unless we ourselves figure out how to program the AI to terminally value its programmers’ True Intentions,

Then we do need to figure out how to program the AI to terminally value its programmers’ True Intentions. That is hardly a fatal objection. Did you think the Loosemore architecture was one that bootstraps itself without any basic goals?

And if we do discover the specific lines of code that will get an AI to perfectly care about its programmer’s True Intentions, such that it reliably self-modifies to better fit them — well, then that will just mean that we’ve solved Friendliness Theory.

No. The goal to understand English is not the same as a goal to be friendly in every way, it is more constrained.

Solving Friendliness, in the MIRI sense, means preloading a detailed expansion of “friendly”. That is not what is happening in the Loosemore architecture. So it is not equivalent to solving the same problem.

The clever hack that makes further Friendliness research unnecessary is Friendliness.

Nope.

Intelligence on its own does not imply Friendliness.

That is an open question.

It’s true that a sufficiently advanced superintelligence should be able to acquire both abilities. But we don’t have them both, and a pre-FOOM self-improving AGI (‘seed’) need not have both. Being able to program good programmers is all that’s required for an intelligence explosion; but being a good programmer doesn’t imply that one is a superlative moral psychologist or moral philosopher.

Then hurrah for the Loosemore architecture, which doesn’t require humans to”solve” friendliness in the MIRI sense.
- wedrifid 13 Sep 2013 9:21 UTC
  6 points
  Parent
  
  Solving Friendliness, in the MIRI sense, means preloading a detailed expansion of “friendly”.
  
  No, it doesn’t.
- Eliezer Yudkowsky 13 Sep 2013 9:27 UTC
  3 points
  Parent
  Juno_Watt, please take further discussion to RobbBB’s blog.
- Rob Bensinger 12 Sep 2013 17:36 UTC
  1 point
  Parent
  
  That problem has got to be solved somehow at some stage, because something that couldn’t pass a Turing Test is no AGI.
  
  Not so! An AGI need not think like a human, need not know much of anything about humans, and need not, for that matter, be as intelligent as a human.
  
  To see this, imagine we encountered an alien race of roughly human-level intelligence. Would a human be able to pass as an alien, or an alien as a human? Probably not anytime soon. Possibly not ever.
  
  (Also, passing a Turing Test does not require you to possess a particularly deep understanding of human morality! A simple list of some random things humans consider right or wrong would generally suffice.)
  
  Why is that a problem? Is anyone suggesting AGI can be had for free?
  
  The problem I’m pointing to here is that a lot of people treat ‘what I mean’ as a magical category. ‘Meaning’ and ‘language’ and ‘semantics’ are single words in English, which masks the complexity of ‘just tell the AI to do what I mean’.
  
  Ok. NL is hard. Everyone knows that. But its got to be solved anyway.
  
  Nope!
  
  Yeah. But it wouldn’t be an AGI or an SI if it couldn’t pass a TT.
  
  It could certainly be an AGI! It couldn’t be an SI—provided it wants to pass a Turing Test, of course—but that’s not a problem we have to solve. It’s one the SI can solve for itself.
  
  A problem which has been solved over and over by humans.
  
  No human being has ever created anything—no system of laws, no government or organization, no human, no artifact—that, if it were more powerful, would qualify as Friendly. In that sense, everything that currently exists in the universe is non-Friendly, if not outright Unfriendly.
  
  Humans don’t need to be loaded apriori with what makes other humans happy, they only need to know general indicators, like smiles and statements of approval.
  
  All or nearly all humans, if they were more powerful, would qualify as Unfriendly.
  
  Moreover, by default, relying on a miscellaneous heap of vaguely moral-sounding machine learning criteria will lead to the end of life on earth. ‘Smiles’ and ‘statements of approval’ are not adequate roadmarks, because those are stimuli the SI can seize control of in unhumanistic ways to pump its reward buttons.
  
  “Intelligence on its own does not imply Friendliness.”
  
  That is an open question.
  
  No, it isn’t. And this is a non sequitur. Nothing else in your post calls orthogonality into question.
  - Eliezer Yudkowsky 13 Sep 2013 9:28 UTC
    2 points
    Parent
    Please take further discussion with Juno_Watt to your blog.
  - Juno_Watt 13 Sep 2013 8:57 UTC
    −2 points
    Parent
    
    Not so! An AGI need not think like a human, need not know much of anything about humans, and need not, for that matter, be as intelligent as a human.
    
    Is that a fact? No, it’s a matter of definition. It’s scarecely credible you are unaware that a lot of people think the TT is critical to AGI.
    
    The problem I’m pointing to here is that a lot of people treat ‘what I mean’ as a magical category.
    
    I can’t see any evidence of anyone invlolved in these discussions doing that. It looks like a straw man to me.
    
    Ok. NL is hard. Everyone knows that. But its got to be solved anyway.
    
    Nope!
    
    An AI you can’t talk to has pretty limited usefulness, and it has pretty limited safety too, since you don;t even have the option of telling it to stop, or expaling to it why you don;t like what it is doing. Oh, and isn’t EY assumign that an AGi will have NLP? After all, it is supposed to be able to talk its way out of the box.
    
    It’s one the SI can solve for itself.
    
    It can figure out semantics for itslef. Values are a subsert of semantics...
    
    No human being has ever created anything—no system of laws, no government or organization, no human, no artifact—that, if it were more powerful, would qualify as Friendly. I
    
    Wherer do you get this stuff from? Modern societies, with their complex legal and security systems are much less violent than ancient socieites. To take ut one example.
    
    All or nearly all humans, if they were more powerful, would qualify as Unfriendly.
    
    Gee. Then I guess they don’t have an architecutre with a basic drive to be friendly.
    
    ‘Smiles’ and ‘statements of approval’ are not adequate roadmarks, because those are stimuli the SI can seize control of in unhumanistic ways to pump its reward buttons.
    
    Why don’t humans do that?
    
    No, it isn’t.
    
    Uh-huh. MIRI has settled that centuries-aold quesiton for once and all has it?
    
    And this is a non sequitur.
    
    It can’t be a non-sequitur, since it is not an arguemnt but a statement of fact.
    
    Nothing else in your post calls orthogonality into question.
    
    So? It wasn’t relevant anywhere else.
    - Rob Bensinger 13 Sep 2013 9:47 UTC
      0 points
      Parent
      
      Is that a fact? No, it’s a matter of definition.
      
      Let’s run with that idea. There’s ‘general-intelligence-1’, which means “domain-general intelligence at a level comparable to that of a human”; and there’s ‘general-intelligence-2’, which means (I take it) “domain-general intelligence at a level comparable to that of a human, plus the ability to solve the Turing Test”. On the face of it, GI2 looks like a much more ad-hoc and heterogeneous definition. To use GI2 is to assert, by fiat, that most intelligences (e.g., most intelligent alien races) of roughly human-level intellectual ability (including ones a bit smarter than humans) are not general intelligences, because they aren’t optimized for disguising themselves as one particular species from a Milky Way planet called Earth.
      
      If your definition has nothing to recommend itself, then more useful definitions are on offer.
      
      “The problem I’m pointing to here is that a lot of people treat ‘what I mean’ as a magical category.”
      
      I can’t see any evidence of anyone invlolved in these discussions doing that. It looks like a straw man to me.
      
      http://lesswrong.com/lw/igf/the_genie_knows_but_doesnt_care/9p8x?context=1#comments
      http://lesswrong.com/lw/igf/the_genie_knows_but_doesnt_care/9p91?context=1#comments
      http://lesswrong.com/lw/igf/the_genie_knows_but_doesnt_care/9q9m
      http://lesswrong.com/lw/igf/the_genie_knows_but_doesnt_care/9qop
      http://ieet.org/index.php/IEET/more/loosemore20121128
      http://nothingismere.com/2013/09/06/the-seed-is-not-the-superintelligence/#comments
      
      ‘Mean’, ‘right’, ‘rational’, etc.
      
      An AI you can’t talk to has pretty limited usefulness
      
      An AI doesn’t need to be able to trick you in order for you to be able to give it instructions. All sorts of useful skills AIs have these days don’t require them to persuade everyone that they’re human.
      
      Oh, and isn’t EY assumign that an AGi will have NLP? After all, it is supposed to be able to talk its way out of the box.
      
      Read the article you’re commenting on. One of its two main theses is, in bold: The seed is not the superintelligence.
      
      It can figure out semantics for itslef. Values are a subsert of semantics...
      
      Yes. We should focusing on solving the values part of semantics, rather than the entire superset.
      
      Wherer do you get this stuff from? Modern societies, with their complex legal and security systems are much less violent than ancient socieites. To take ut one example.
      
      Doesn’t matter. Give an ancient or a modern society arbitrarily large amounts of power overnight, and the end results won’t differ in any humanly important way. There won’t be any nights after that.
      
      Why don’t humans do that?
      
      Setting aside the power issue: Because humans don’t use ‘smiles’ or ‘statements of approval’ or any other string of virtues an AI researcher has come up with to date for its decision criteria. The specific proposals for making AI humanistic to date have all depended on fake utility functions, or stochastic riffs on fake utility functions.
      
      Uh-huh. MIRI has settled that centuries-aold quesiton for once and all has it?
      
      Lots of easy questions were centuries old when they were solved. ‘This is old, therefore I’m not going to think about it’ is a bad pattern to fall into. If you think the orthogonality thesis is wrong, then give an argument establishing agnosticism or its negation.
      - TAG 11 May 2023 14:31 UTC
        1 point
        Parent
        
        I can’t see any evidence of anyone invlolved in these discussions doing that. It looks like a straw man to me.
        
        ‘Mean’, ‘right’, ‘rational’, etc.
        
        If want to be sure that these terms, as used by a particular person, are magical categories, you need to ask the particular person whether they have a mechanical interpretation in mind—address the argument, not the person.
        
        Whether any particular person has a mechanical interpretation of these concepts in mind cannot be shown by a completely general argument like Ghosts in the Machine . You don’t think that your use of ‘Mean’, ‘right’, ‘rational’, etc is necessarily magical! But whether someone has a non-magical explanation can easily be shown by asking them. In particular, it is highly reasonable to assume that an actual AI researcher would have such an interpretation. It is not reasonable to interpret sheer absence of evidence—especially a wilful absence of evidence, based on refusal to engage—as evidence of magical thinking.
        
        At the time of writing the MIRI/LW side of this debate is known to be wrong… and that is not despite good, rational epistemology, it is *because of *bad, dogmatic, Ad Hominen, debate. There are multiple occasions where EY instructs his followers not to even engage with the side that turned out to be correct.,