Dmytry comments on [draft] Concepts are Difficult, and Unfriendliness is the Default: A Scary Idea Summary

Dmytry 31 Mar 2012 20:44 UTC
7 points
You don’t know how much do you privilege a hypothesis by picking the arbitrary unbounded goal G out of goals that we humans easily define using English language. It is very easy to say ‘maximize the paperclips or something’ - it is very hard to formally define what paperclips are even without any run-time constraints, and it’s very dubious that you can forbid solutions similar to those that a Soviet factory would employ if it was tasked with maximization of paperclip output (a lot of very tiny paperclips, or just falsified numbers for the outputs, or making the paperclips and then re-melting them). Furthermore, it is really easy for us to say ‘self’ but defining self formally is very difficult as well, if you want the AI’s self improvement not to equal suicide.

Furthermore, the AI starts stupid. It better be caring about itself before it can start inventing self preservation via self-foresight. Defining the goals in terms of some complexity metrics = goals that have something to do with life.
- Emile 31 Mar 2012 20:58 UTC
  0 points
  Parent
  My argument doesn’t require that anybody be able to formally define “self” or “maximize paperclips”; it doesn’t require the goal G to be picked among those that are easily defined in English.
  
  An agent capable of reasoning about the world should be able to make an inference like “if all copies of me are destroyed, it makes it much less likely that goal G would be reached”; it may not have exactly that form, but it should be something analogous. It doesn’t matter if I can’t formalize that, the agent may not have a completely formal version either, only one that is sufficient for it’s purposes.
  - Dmytry 31 Mar 2012 21:00 UTC
    3 points
    Parent
    
    My argument doesn’t require that anybody be able to formally define “self” or “maximize paperclips”; it doesn’t require the goal G to be picked among those that are easily defined in English.
    
    Show 3 examples of goal G. Somewhere I’ve read awesome technique for avoiding the abstraction mistakes—asking to show 3 examples.
    - Emile 31 Mar 2012 21:13 UTC
      0 points
      Parent
      What’s the point? Are you going to nitpick that my goals aren’t formal enough, even though I’m not making any claim at all about what kind of goals those could be?
      
      Are you claiming that it’s impossible for an agent to have goals? That the set of goals that it’s even conceivable for an AI to have (without immediately wireheading or something) is much narrower than what most people here assume?
      
      I’m not even sure what this disagreement is about right now, or even if there is a disagreement.
      - Dmytry 31 Mar 2012 21:26 UTC
        2 points
        Parent
        Ya, I think the set of goals is very narrow. The AI here starts of Descartes level genius and proceeds to self preserve, understand the map-territory distinction for non-wireheading, foreseeing the possibility that instrumental goals which look good may destroy the terminal goal, and such.
        
        The AI I imagine starts off stupid and has some really narrowly (edit: or should i say, short-foresighted) self improving non self destructive goal likely having to do with maximization of complexity in some way. Think evolution, don’t think fully grown Descartes waking up after amnesia. It ain’t easy to reinvent the ‘self’. It’s also not easy to look at agent (yourself) and say—wow, this agent works to maximize G—without entering infinite recursion. We humans, if we escaped out of our universe into some super-universe, we might wreck some havoc but we’d sacrifice a bit of utility to preserve anything resembling life. Why? Well, we started stupid, and that’s how we got our goals.