Rob Bensinger comments on The genie knows, but doesn’t care

Rob Bensinger 10 Sep 2013 16:46 UTC
3 points

“code in the high-level sentence, and let the AI figure it out.”

http://lesswrong.com/lw/rf/ghosts_in_the_machine/

“Maybe we gave it the low-level expansion of ‘happy’ that we or our seed AI came up with ‘together with’ an instruction that it is meant to capture the meaning of the high-level statement”

If the AI is too dumb to understand ‘make us happy’, then why should we expect it to be smart enough to understand ‘figure out how to correctly understand “make us happy”, and then follow that instruction’? We have to actually code ‘correctly understand’ into the AI. Otherwise, even when it does have the right understanding, that understanding won’t be linked to its utility function.

“Maybe the AI will value getting things right because it is rational.”

http://lesswrong.com/lw/igf/the_genie_knows_but_doesnt_care/
- Peterdjones 10 Sep 2013 17:06 UTC
  0 points
  Parent
  
  “code in the high-level sentence, and let the AI figure it out.”
  
  http://lesswrong.com/lw/rf/ghosts_in_the_machine/
  
  So it’s impossible to directly or indirectly code in the compex thing called semantics, but possible to directly or indirectly code in the compex thing called morality? What? What is your point? You keep talking as if I am suggesting there is someting that can be had for free, without coding. I never even remotely said that.
  
  If the AI is too dumb to understand ‘make us happy’, then why should we expect it to be smart enough to understand ‘figure out how to correctly understand “make us happy”, and then follow that instruction’? We have to actually code ‘correctly understand’ into the AI. Otherwise, even when it does have the right understanding, that understanding won’t be linked to its utility function.
  
  I know. A Loosemore architecture AI has to treat its directives as directives. I never disputed that. But coding “follow these plain English instructions” isn’t obviously harder or more fragile than coding “follow <>”. And it isn’t trivial, and I didn’t say it was.
  - Rob Bensinger 10 Sep 2013 17:16 UTC
    5 points
    Parent
    
    So it’s impossible to directly or indirectly code in the compex thing called semantics, but possible to directly or indirectly code in the compex thing called morality?
    
    Read the first section of the article you’re commenting on. Semantics may turn out to be a harder problem than morality, because the problem of morality may turn out to be a subset of the problem of semantics. Coding a machine to know what the word ‘Friendliness’ means (and to care about ‘Friendliness’) is just a more indirect way of coding it to be Friendly, and it’s not clear why that added indirection should make an already risky or dangerous project easy or safe. What does indirect indirect normativity get us that indirect normativity doesn’t?
    - Eliezer Yudkowsky 10 Sep 2013 17:59 UTC
      12 points
      Parent
      Robb, at the point where Peterdjones suddenly shows up, I’m willing to say—with some reluctance—that your endless willingness to explain is being treated as a delicious free meal by trolls. Can you direct them to your blog rather than responding to them here? And we’ll try to get you some more prestigious non-troll figure to argue with—maybe Gary Drescher would be interested, he has the obvious credentials in cognitive reductionism but is (I think incorrectly) trying to derive morality from timeless decision theory.
      - Rob Bensinger 10 Sep 2013 18:12 UTC
        8 points
        Parent
        Sure. I’m willing to respond to novel points, but at the stage where half of my responses just consist of links to the very article they’re commenting on or an already-referenced Sequence post, I agree the added noise is ceasing to be productive. Fortunately, most of this seems to already have been exorcised into my blog. :)
        lukeprog 18 Sep 2013 1:44 UTC
        12 points
        Parent
        Agree with Eliezer. Your explanatory skill and patience are mostly wasted on the people you’ve been arguing with so far, though it may have been good practice for you. I would, however, love to see you try to talk Drescher out of trying to pull moral realism out of TDT/UDT, or try to talk Dalyrmple out of his “I’m not partisan enough to prioritize human values over the Darwinian imperative” position, or help Preston Greene persuade mainstream philosophers of “the reliabilist metatheory of rationality” (aka rationality as systematized winning).
    - Peterdjones 10 Sep 2013 17:25 UTC
      −1 points
      Parent
      Semantcs isn’t optional. Nothing could qualify as an AGI,let alone a super one, unless it could hack natural language. So Loosemore architectures don’t make anything harder, since semantics has to be solved anyway.
      - Rob Bensinger 10 Sep 2013 18:08 UTC
        5 points
        Parent
        It’s a problem of sequence. The superintelligence will be able to solve Semantics-in-General, but at that point if it isn’t already safe it will be rather late to start working on safety. Tasking the programmers to work on Semantics-in-General makes things harder if it’s a more complex or roundabout way of trying to address Indirect Normativity; most of the work on understanding what English-language sentences mean can be relegated to the SI, provided we’ve already made it safe to make an SI at all.
        TAG 10 May 2023 12:09 UTC
        1 point
        Parent
        
        “code in the high-level sentence, and let the AI figure it out.”
        
        http://lesswrong.com/lw/rf/ghosts_in_the_machine/
        
        It’s worth noting that using an AI’s semantic understanding of ethics to modify it’s motivational system is so unghostly, and unmysterious that it’s actually been done:
        
        https://astralcodexten.substack.com/p/constitutional-ai-rlhf-on-steroids
        
        But that doesn’t prove much, because it was never—not in 2023, not in 2013 -- the case that that kind of self-correction was necessarily an appeal to the supernatural. Using one part of a software system to modify another is not magic!
        
        The superintelligence will be able to solve Semantics-in-General, but at that point if it isn’t already safe it will be rather late to start working on safety.
        
        We have AIs with very good semantic understanding that haven’t killed us, and we are working on safety.
        What links here?
        TAG's comment on Contra Yudkowsky on Epistemic Conduct for Author Criticism by Zack_M_Davis (16 Sep 2023 14:55 UTC; -2 points)
        Peterdjones 11 Sep 2013 8:07 UTC
        −1 points
        Parent
        Then solve semantics in a seed.
  - Eliezer Yudkowsky 10 Sep 2013 17:56 UTC
    4 points
    Parent
    PeterDJones, if you wish to converse further with RobbBB, I ask that you do so on RobbBB’s blog rather than here.