JGWeissman comments on General purpose intelligence: arguing the Orthogonality thesis

JGWeissman 1 Oct 2013 15:13 UTC
3 points

You are making the standard MIRI assumptions that goals are unupdatable

No, I am arguing that agents with goals generally don’t want to update their goals. Neither I nor MIRI assume goals are unupdatable, actually a major component of MIRI’s research is on how to make sure a self improving AI has stable goals.

and don’t include rationality (non arbitrariness, etc) as a terminal value. (The latter is particularly odd, as Orthogonality implies it).

It is possible to have an agent that terminally values meta properties of its own goal system. Such agents, if they are capable of modifying their goal system, will likely self modify to some self-consistent “attractor” system. This does not mean that all agents will converge on a universal goal system. There are different ways that agents can value meta properties of their own goal system, so there are likely many attractors, and many possible agents don’t have such meta values and will not want to modify their goal systems.
- TheAncientGeek 1 Oct 2013 22:24 UTC
  −7 points
  Parent
  
  It is possible to have an agent that terminally values meta properties of its own goal system. Such agents, if they are capable of modifying their goal system, will likely self modify to some self-consistent “attractor” system. This does not mean that all agents will converge on a universal goal system.
  
  Who asserted they would? Moral agents can have all sorts of goals, They just have to respect each others values. If Smith wants to be an athlete, and Robinson is a budding writer, that doesn’t mean one of them is immoral.
  
  There are different ways that agents can value meta properties of their own goal system,
  
  Ok. That would be a problem with your suggestion of valuing arbitrary meta properties of their goal system. Then lets go back to my suggestion of valuing rationality.
  
  so there are likely many attractors, and many possible agents don’t have such meta values and will not want to modify their goal systems.
  
  Agents will do what they are built to do. If agents that don’t value rationality are dangerous, build ones that do.
  
  MIRI: “We have detemined that cars without bbrakes are dangerous. We have also determined that the best solution is to reduce the speed limit to 10mph”
  
  Everyone else: “We know cars without brakes are dangerous. That’s why we build them with brakes”.
  - Desrtopa 1 Oct 2013 23:39 UTC
    1 point
    Parent
    
    Who asserted they would? Moral agents can have all sorts of goals, They just have to respect each others values. If Smith wants to be an athlete, and Robinson is a budding writer, that doesn’t mean one of them is immoral.
    
    Have to, or else what? And how do we separate moral agents from agents that are not moral?
    
    Ok. That would be a problem with your suggestion of valuing arbitrary meta properties of their goal system. Then lets go back to my suggestion of valuing rationality.
    
    Valuing rationality for what? What would an agent which “values rationality” do?
    
    Agents will do what they are built to do. If agents that don’t value rationality are dangerous, build ones that do.
    
    MIRI: “We have detemined that cars without bbrakes are dangerous. We have also determined that the best solution is to reduce the speed limit to 10mph”
    
    Everyone else: “We know cars without brakes are dangerous. That’s why we build them with brakes”.
    
    If the solution is to build agents that “value rationality,” can you explain how to do that? If it’s something so simple as to be analogous to adding brakes to a car, as opposed to, say, programming the car to be able to drive itself (let alone something much more complicated,) then it shouldn’t be so difficult to describe how to do it.
    - TheAncientGeek 9 Oct 2013 17:37 UTC
      −8 points
      Parent
      
      Moral agents [..] have to respect each others values.
      
      Have to, or else what?
      
      Have to, logically. Like even numbers have to be divisible,
      
      And how do we separate moral agents from agents that are not moral?
      
      How do we recognise anything? They have behaviour and characteristics which match the definition.
      
      Valuing rationality for what?
      
      For itself. I do not accept that rationality can only be instrumental, a means to an end.
      
      What would an agent which “values rationality” do?
      
      The kind of thing EY, the CFAR and other promoters of rationality urge people to do.
      
      If the solution is to build agents that “value rationality,” can you explain how to do that?
      
      In the same kind of very broad terms that MIRI can explain how to build Artificial Obsessive Compulsives.
      
      If it’s something so simple as to be analogous to adding brakes to a car,
      
      The analogy was not about simplicity. Illustrative analogies are always simpler than what they are illustrating: that is where their usefulness lies.