solipsist comments on Stuart Russell: AI value alignment problem must be an “intrinsic part” of the field’s mainstream agenda

solipsist 26 Nov 2014 18:34 UTC
3 points
Here’s the argument I was hearing:

Humans can be turned into money pumps. Consequently, the most important point is to make sure that your AI can be turned into a money pump, since if you don’t, it will automatically diverge from human values.

If this is what you are arguing, it would take a lot to convince me of that position.

Here’s the argument I think you’re making:

Don’t make AIs try to optimize stuff without bound. If you try to optimize any fixed objective function without bound, you will end up sacrificing all else that you hold dear.

I agree that optimizing without bound seems likely to kill you. If a safe alternative approach is possible, I don’t know what it would be. My guess would be most alternative approaches are equivalent to an optimization problem.
- Unknowns 26 Nov 2014 19:32 UTC
  1 point
  Parent
  Right, the second argument is the one that concerns me, since it should be possible to convince people to adjust their preferences in some way that will make them consistent.
  
  My suggestion here was simply to adopt a hard limit to the utility function. So for example instead of valuing lifespan without limit, there would be some value such that the AI is indifferent to extending it even more. This kind of AI might take the lifespan deal up to a certain point, but it would not keep taking it permanently, and in this way it would avoid driving its probability of survival down to a limit of zero.
  
  I think Eliezer does not like this idea because he claims to value life infinitely, assigning ever greater values to longer lifespans and an infinite value to an infinite lifespan. But he is wrong about his own values, because being a limited being he cannot actually care infinitely about anything, and this is why the lifespan dilemma bothers him. If he actually cared infinitely, as he claims, then he would not mind driving his probability of survival down to zero.
  
  I am not saying (as he has elsewhere described this) that “the utility function is up for grabs.” I am saying that if you understand yourself correctly, you will see that you do not yourself assign an infinite value to anything, so it would be a serious and possibly fatal mistake to make a machine that assigns an infinite value to something.
  - solipsist 26 Nov 2014 19:52 UTC
    2 points
    Parent
    Yeah, I follow. I’ll bring up another wrinkle (which you may already be familiar with): Suppose the objective you’re maximizing never equals or exceeds 20. You can reach to 19.994, 19.9999993, 19.9999999999999995, but never actually reach 20. Then even though your objective function is bounded, you will still try to optimize forever, and may resort to increasingly desperate measures to eek out another .000000000000000000000000001.
    - Unknowns 26 Nov 2014 19:58 UTC
      −3 points
      Parent
      Yes, this would happen if you take an unbounded function and simply map it to a bounded function without actually changing it. That is why I am suggesting admitting that you really don’t have an infinite capacity for caring, and describing what you care about as though you did care infinitely is mistaken, whether you describe this with an unbounded or with a bounded function. This requires admitting that scope insensitivity, after a certain point, is not a bias, but just an objective fact that at a certain point you really don’t care anymore.