Eliezer Yudkowsky comments on What can you do with an Unfriendly AI?

Eliezer Yudkowsky 21 Dec 2010 0:21 UTC
4 points

I can write down many such utility functions easily, as contrasted with the difficulty of describing friendliness, so I can at least hope to design an AI which has one of them.

What you can do when you can write down stable utility functions—after you have solved the self-modification stability problem—but you can’t yet write down CEV—is a whole different topic from this sort of AI-boxing!
- paulfchristiano 21 Dec 2010 1:31 UTC
  6 points
  Parent
  I don’t think I understand this post.
  
  My claim is that it is easier to write down some stable utility functions than others. This is intimately related to the OP, because I am claiming as a virtue of my approach to boxing that it leaves us with the problem of getting an AI to follow essentially any utility function consistently. I am not purporting to solve that problem here, just making the claim that it is obviously no harder and almost obviously strictly easier than friendliness.