In what sense of “programmed not to”? If they’re programmed not to pursue convergent instrumental values but that programming is not encoded in the utility function, the utility function (and its implied convergent instrumental values) will trump the “programming not to.”
I’m not so sure about “surely.” I worry about the Yudkowskian suggestion that “once the superintelligent AI wants something different than you do, you’ve already lost.”
You said:
...and used the above argument as justification. But it doesn’t follow. What you need is:
Ben’s arguing that they are likely to be programmed not to.
In what sense of “programmed not to”? If they’re programmed not to pursue convergent instrumental values but that programming is not encoded in the utility function, the utility function (and its implied convergent instrumental values) will trump the “programming not to.”
Maybe—but surely there will be other ways of doing the programming that actually work.
I’m not so sure about “surely.” I worry about the Yudkowskian suggestion that “once the superintelligent AI wants something different than you do, you’ve already lost.”
So, you make sure the programming is within the goal system. “Encoded in the utility function”—as you put it.
Yes, but now your solution is FAI-complete, which was my point from the beginning.