Note that it “routes around” but it also satisfies those preferences; ultimately, AI should prevent humans from dying of terrorist attacks and from other causes.
Returning to the bread example: it basically means that the market will optimise short-term profits, not matter which rules we will try to impose on it.
Now the question arises: what will optimise AI no matter what?
One answer is “its own reward function”, which means that any sufficiently advance AI will quickly find the ways to wirehead itself and halts. This means that there is an upper limit of AI’s optimisation power, above which it wireheads itself almost immediately.
Interesting question is how this upper limit relates to the AI’s level needed to tile the universe with paperclips. If wireheading level is above universe tiling level, then paperclipper is possible. Otherwise, single paperclipper can’t tile the universe, but society of AI’s could still do it.
I am not sure that any type of bias could be added up to better result. For example, if I want something sweet (my real goal) and ask for a cucumber (biased idea of sweet), I will get the cucumber, but not sweet food.
I feel it’s more like “you want something sweet, except between 2 and 3 pm”. In that case, one solution is for the shop to only stock sweet things, and not let you in between 2 and 3 (or just ignore you during that time).
The bread example is really worrying: AI may find the ways to routes around the not only biases but also preferences.
Note that it “routes around” but it also satisfies those preferences; ultimately, AI should prevent humans from dying of terrorist attacks and from other causes.
Returning to the bread example: it basically means that the market will optimise short-term profits, not matter which rules we will try to impose on it.
Now the question arises: what will optimise AI no matter what?
One answer is “its own reward function”, which means that any sufficiently advance AI will quickly find the ways to wirehead itself and halts. This means that there is an upper limit of AI’s optimisation power, above which it wireheads itself almost immediately.
Interesting question is how this upper limit relates to the AI’s level needed to tile the universe with paperclips. If wireheading level is above universe tiling level, then paperclipper is possible. Otherwise, single paperclipper can’t tile the universe, but society of AI’s could still do it.
I am not sure that any type of bias could be added up to better result. For example, if I want something sweet (my real goal) and ask for a cucumber (biased idea of sweet), I will get the cucumber, but not sweet food.
I feel it’s more like “you want something sweet, except between 2 and 3 pm”. In that case, one solution is for the shop to only stock sweet things, and not let you in between 2 and 3 (or just ignore you during that time).