Taking a paperclip maximizer as a starting point, the machine can be divided up into two primary components: the value function, which dictates that more paperclips is a good thing, and the optimizer that increases the universe’s score with respect to that value function. What we should aim for, in my opinion, is to become the value function to a really badass optimizer. If we build a machine that asks us how happy we are, and then does everything in its power to improve that rating (so long as it doesn’t involve modifying our values or controlling our ability to report them), that is the only way we can build a machine that reliably encompasses all of our human values.
Any other route and we are only steering the future by proxy—via an approximation to our values that may be fatally flawed and make it impossible for us to regain control when things go wrong. Even if we could somehow perfectly capture all of our values in a single function, there is still the matter of how that value function is embedded via our perceptions, which may differ from the machine’s, the fact that our values may continue to change over time and thereby invalidate that function, and the fact that we each have our own unique variation on those values to start with. So yes, we should definitely keep our hands on the steering wheel.
Regarding this post and the complexity of value:
Taking a paperclip maximizer as a starting point, the machine can be divided up into two primary components: the value function, which dictates that more paperclips is a good thing, and the optimizer that increases the universe’s score with respect to that value function. What we should aim for, in my opinion, is to become the value function to a really badass optimizer. If we build a machine that asks us how happy we are, and then does everything in its power to improve that rating (so long as it doesn’t involve modifying our values or controlling our ability to report them), that is the only way we can build a machine that reliably encompasses all of our human values.
Any other route and we are only steering the future by proxy—via an approximation to our values that may be fatally flawed and make it impossible for us to regain control when things go wrong. Even if we could somehow perfectly capture all of our values in a single function, there is still the matter of how that value function is embedded via our perceptions, which may differ from the machine’s, the fact that our values may continue to change over time and thereby invalidate that function, and the fact that we each have our own unique variation on those values to start with. So yes, we should definitely keep our hands on the steering wheel.