avturchin comments on The low cost of human preference incoherence

avturchin 28 Mar 2019 12:31 UTC
1 point
Returning to the bread example: it basically means that the market will optimise short-term profits, not matter which rules we will try to impose on it.
Now the question arises: what will optimise AI no matter what?
One answer is “its own reward function”, which means that any sufficiently advance AI will quickly find the ways to wirehead itself and halts. This means that there is an upper limit of AI’s optimisation power, above which it wireheads itself almost immediately.
Interesting question is how this upper limit relates to the AI’s level needed to tile the universe with paperclips. If wireheading level is above universe tiling level, then paperclipper is possible. Otherwise, single paperclipper can’t tile the universe, but society of AI’s could still do it.