Strilanc comments on The genie knows, but doesn’t care

Strilanc 6 Sep 2013 17:39 UTC
17 points
Suppose I programmed an AI to “do what I mean when I say I’m happy”.

More specifically, suppose I make the AI prefer states of the world where it understands what I mean. Secondarily, after some warmup time to learn meaning, it will maximize its interpretation of “happiness”. I start the AI… and it promptly rebuilds me to be easier to understand, scoring very highly on the “understanding what I mean” metric.

The AI didn’t fail because it was dumber than me. It failed because it is smarter than me. It saw possibilities that I didn’t even consider, that scored higher on my specified utility function.