Stuart_Armstrong comments on AI indifference through utility manipulation

Stuart_Armstrong Sep 3, 2010, 9:11 AM
0 points

Your hypothetical utility function references undefined concepts such as “taking control of”, “cooperating”, “humans”, and “self”, etc etc

If you actually try to ground your utility function and go through the work of making it realistic, you quickly find that it ends up being something on the order of complexity of a human brain, and its not something that you can easily define in a few pages of math.

Don’t get confused by the initial example, which was there purely for illustration (as I said, if you knew all these utility values, you wouldn’t need any sort of filter, you’d just set all utilities but U(B) to zero).

It’s because these concepts are hard that I focused on indifference, which, it seems, has a precise mathematical formulation. You can implement the general indifference without understanding anything about U at all.

I’m skeptical then about the entire concept of ‘utility function filters’, as it seems their complexity would be on the order of or greater than the utility function itself, and you need to keep constructing an endless sequence of such complex utility function filters.

The description of the filter is in this blog post; a bit more work will be needed to see that certain universes are indistinguishable up until X. But this can be approximated, if needed.

U, on the other hand, can be arbitrarily complex.