You and I are both bound by the terms of a scenario that someone else has set here.
Ok, if you want to pass the buck, I won’t stop you. But this other person’s scenario still has a faulty premise. I’ll take it up with them if you like; just point out where they state that the goal code starts out working correctly.
To summarize my complaint, it’s not very useful to discuss an AI with a “sincere” goal of X, because the difficulty comes from giving the AI that goal in the first place.
What you did was consider some other possibilities, such as those in which the AI is actually not being sincere. Nothing wrong with considering those, but that would be a story for another day.
As I see it, your (adopted) scenario is far less likely than other scenario(s), so in a sense that one is the “story for another day.” Specifically, a day when we’ve solved the “sincere goal” issue.
Ok, if you want to pass the buck, I won’t stop you. But this other person’s scenario still has a faulty premise. I’ll take it up with them if you like; just point out where they state that the goal code starts out working correctly.
To summarize my complaint, it’s not very useful to discuss an AI with a “sincere” goal of X, because the difficulty comes from giving the AI that goal in the first place.
As I see it, your (adopted) scenario is far less likely than other scenario(s), so in a sense that one is the “story for another day.” Specifically, a day when we’ve solved the “sincere goal” issue.