Luke: We might try to hard-wire a collection of rules into an AGI which restrict the pursuit of some of these convergent instrumental goals, but a superhuman AGI would realize that it could better achieve its final goals if it could invent a way around those hard-wired rules and have no ad-hoc obstacles to its ability to execute intelligent plans for achieving its goals.
That seems like a controversial statement. I don’t think I agree that universal instrumental values are likely to trump the values built into machines. More likely the other way around. Evolution between different agents with different values might promote universal instrumental values—but that is a bit different.
I didn’t mean that convergent instrumental values would trump a machine’s explicit utility function. I meant to make a point about rules built into the code of the machine but “outside” its explicit utility function (if it has or converges toward such a thing).
In what sense of “programmed not to”? If they’re programmed not to pursue convergent instrumental values but that programming is not encoded in the utility function, the utility function (and its implied convergent instrumental values) will trump the “programming not to.”
I’m not so sure about “surely.” I worry about the Yudkowskian suggestion that “once the superintelligent AI wants something different than you do, you’ve already lost.”
That seems like a controversial statement. I don’t think I agree that universal instrumental values are likely to trump the values built into machines. More likely the other way around. Evolution between different agents with different values might promote universal instrumental values—but that is a bit different.
I didn’t mean that convergent instrumental values would trump a machine’s explicit utility function. I meant to make a point about rules built into the code of the machine but “outside” its explicit utility function (if it has or converges toward such a thing).
You said:
...and used the above argument as justification. But it doesn’t follow. What you need is:
Ben’s arguing that they are likely to be programmed not to.
In what sense of “programmed not to”? If they’re programmed not to pursue convergent instrumental values but that programming is not encoded in the utility function, the utility function (and its implied convergent instrumental values) will trump the “programming not to.”
Maybe—but surely there will be other ways of doing the programming that actually work.
I’m not so sure about “surely.” I worry about the Yudkowskian suggestion that “once the superintelligent AI wants something different than you do, you’ve already lost.”
So, you make sure the programming is within the goal system. “Encoded in the utility function”—as you put it.
Yes, but now your solution is FAI-complete, which was my point from the beginning.