I didn’t mean that convergent instrumental values would trump a machine’s explicit utility function. I meant to make a point about rules built into the code of the machine but “outside” its explicit utility function (if it has or converges toward such a thing).
In what sense of “programmed not to”? If they’re programmed not to pursue convergent instrumental values but that programming is not encoded in the utility function, the utility function (and its implied convergent instrumental values) will trump the “programming not to.”
I’m not so sure about “surely.” I worry about the Yudkowskian suggestion that “once the superintelligent AI wants something different than you do, you’ve already lost.”
I didn’t mean that convergent instrumental values would trump a machine’s explicit utility function. I meant to make a point about rules built into the code of the machine but “outside” its explicit utility function (if it has or converges toward such a thing).
You said:
...and used the above argument as justification. But it doesn’t follow. What you need is:
Ben’s arguing that they are likely to be programmed not to.
In what sense of “programmed not to”? If they’re programmed not to pursue convergent instrumental values but that programming is not encoded in the utility function, the utility function (and its implied convergent instrumental values) will trump the “programming not to.”
Maybe—but surely there will be other ways of doing the programming that actually work.
I’m not so sure about “surely.” I worry about the Yudkowskian suggestion that “once the superintelligent AI wants something different than you do, you’ve already lost.”
So, you make sure the programming is within the goal system. “Encoded in the utility function”—as you put it.
Yes, but now your solution is FAI-complete, which was my point from the beginning.