What you say makes sense …. except that you and I are both bound by the terms of a scenario that someone else has set here.
So, the terms (as I say, this is not my doing!) of reference are that an AI might sincerely believe that it is pursuing its original goal of making humans happy (whatever that means …. the ambiguity is in the original), but in the course of sincerely and genuinely pursuing that goal, it might get into a state where it believes that the best way to achieve the goal is to do something that we humans would consider to be NOT achieving the goal.
What you did was consider some other possibilities, such as those in which the AI is actually not being sincere. Nothing wrong with considering those, but that would be a story for another day.
Oh, and one other thing that arises from your above remark: remember that what you have called the “fail-safe” is not actually a fail-safe, it is an integral part of the original goal code (X). So there is no question of this being a situation where ”… it wants Z, and a fail-safe prevents it from getting Z, [so] it will find a way around that fail-safe.” In fact, the check is just part of X, so it WANTS to check as much as wants anything else involved in the goal.
I am not sure that self-modification is part of the original terms of reference here, either. When Muehlhauser (for example) went on a radio show and explained to the audience that a superintelligence might be programmed to make humans happy, but then SINCERELY think it was making us happy when it put us on a Dopamine Drip, I think he was clearly not talking about a free-wheeling AI that can modify its goal code. Surely, if he wanted to imply that, the whole scenario goes out the window. The AI could have any motivation whatsoever.
You and I are both bound by the terms of a scenario that someone else has set here.
Ok, if you want to pass the buck, I won’t stop you. But this other person’s scenario still has a faulty premise. I’ll take it up with them if you like; just point out where they state that the goal code starts out working correctly.
To summarize my complaint, it’s not very useful to discuss an AI with a “sincere” goal of X, because the difficulty comes from giving the AI that goal in the first place.
What you did was consider some other possibilities, such as those in which the AI is actually not being sincere. Nothing wrong with considering those, but that would be a story for another day.
As I see it, your (adopted) scenario is far less likely than other scenario(s), so in a sense that one is the “story for another day.” Specifically, a day when we’ve solved the “sincere goal” issue.
What you say makes sense …. except that you and I are both bound by the terms of a scenario that someone else has set here.
So, the terms (as I say, this is not my doing!) of reference are that an AI might sincerely believe that it is pursuing its original goal of making humans happy (whatever that means …. the ambiguity is in the original), but in the course of sincerely and genuinely pursuing that goal, it might get into a state where it believes that the best way to achieve the goal is to do something that we humans would consider to be NOT achieving the goal.
What you did was consider some other possibilities, such as those in which the AI is actually not being sincere. Nothing wrong with considering those, but that would be a story for another day.
Oh, and one other thing that arises from your above remark: remember that what you have called the “fail-safe” is not actually a fail-safe, it is an integral part of the original goal code (X). So there is no question of this being a situation where ”… it wants Z, and a fail-safe prevents it from getting Z, [so] it will find a way around that fail-safe.” In fact, the check is just part of X, so it WANTS to check as much as wants anything else involved in the goal.
I am not sure that self-modification is part of the original terms of reference here, either. When Muehlhauser (for example) went on a radio show and explained to the audience that a superintelligence might be programmed to make humans happy, but then SINCERELY think it was making us happy when it put us on a Dopamine Drip, I think he was clearly not talking about a free-wheeling AI that can modify its goal code. Surely, if he wanted to imply that, the whole scenario goes out the window. The AI could have any motivation whatsoever.
Hope that clarifies rather than obscures.
Ok, if you want to pass the buck, I won’t stop you. But this other person’s scenario still has a faulty premise. I’ll take it up with them if you like; just point out where they state that the goal code starts out working correctly.
To summarize my complaint, it’s not very useful to discuss an AI with a “sincere” goal of X, because the difficulty comes from giving the AI that goal in the first place.
As I see it, your (adopted) scenario is far less likely than other scenario(s), so in a sense that one is the “story for another day.” Specifically, a day when we’ve solved the “sincere goal” issue.