Absolutely agreed that whatever instrumental human values that we think about explicitly enough to encode into our machines (like not killing passengers or pedestrians while driving from point A to point B), or that are implicit enough in the task itself that optimizing for performing that task will necessarily implement those values as well (like not crashing and exploding between A and B) will most likely be instantiated in machine intelligence as we develop it.
Agreed that if that’s the rule rather than the exception—that is, if all or almost all of the things we care about are either things we understand explicitly or things that are implicit in the tasks we attempt to optimize—then building systems that attempt to optimize those things, with explicit safety features, is likely to alleviate more suffering than it causes.
Absolutely agreed that whatever instrumental human values that we think about explicitly enough to encode into our machines (like not killing passengers or pedestrians while driving from point A to point B), or that are implicit enough in the task itself that optimizing for performing that task will necessarily implement those values as well (like not crashing and exploding between A and B) will most likely be instantiated in machine intelligence as we develop it.
Agreed that if that’s the rule rather than the exception—that is, if all or almost all of the things we care about are either things we understand explicitly or things that are implicit in the tasks we attempt to optimize—then building systems that attempt to optimize those things, with explicit safety features, is likely to alleviate more suffering than it causes.