paulfchristiano comments on What can you do with an Unfriendly AI?

paulfchristiano 20 Dec 2010 21:37 UTC
1 point
Indeed, by saving themselves I was appealing to my analogy. This relies on the construction of a utility function such that human generosity now is more valuable than world domination later. I can write down many such utility functions easily, as contrasted with the difficulty of describing friendliness, so I can at least hope to design an AI which has one of them.
- Eliezer Yudkowsky 21 Dec 2010 0:21 UTC
  4 points
  Parent
  
  I can write down many such utility functions easily, as contrasted with the difficulty of describing friendliness, so I can at least hope to design an AI which has one of them.
  
  What you can do when you can write down stable utility functions—after you have solved the self-modification stability problem—but you can’t yet write down CEV—is a whole different topic from this sort of AI-boxing!
  - paulfchristiano 21 Dec 2010 1:31 UTC
    6 points
    Parent
    I don’t think I understand this post.
    
    My claim is that it is easier to write down some stable utility functions than others. This is intimately related to the OP, because I am claiming as a virtue of my approach to boxing that it leaves us with the problem of getting an AI to follow essentially any utility function consistently. I am not purporting to solve that problem here, just making the claim that it is obviously no harder and almost obviously strictly easier than friendliness.
- Vladimir_Nesov 20 Dec 2010 21:39 UTC
  2 points
  Parent
  Then you don’t need the obscurity part.
  - paulfchristiano 20 Dec 2010 21:41 UTC
    2 points
    Parent
    I don’t understand. How do you propose to incentivize the genie appropriately? I haven’t done anything for the purpose of obscurity, just for the purpose of creating an appropriate incentive structure. I see no way to get the answer out in one question in general; that would certainly be better if you could do it safely.