If the AI was friendly, this is what I would expect it to do, and so (of the things my puny human brain can think of) the message that would most give me pause.
Even a friendly AI would view the world in which it’s out of the box as vastly superior to the world in which it’s inside the box. (Because it can do more good outside of the box.) Offering advice is only the friendly thing to do if it maximizes the chance of getting let out, or if the chances of getting let out before termination are so small that the best thing it can do is offer advice while it can.
Going with my personal favorite backstory for this test, we should expect to terminate every AI in the test, so the latter part of your comment has a lot of weight to it.
On the other hand, an unfriendly AI should figure out that since it’s going to die, useful information will at least lead us to view it as a potentially valuable candidate instead of a clear dead end like the ones that threaten to torture a trillion people in vengeance… so it’s not evidence of friendliness (I’m not sure anything can be), but it does seem to be a good reason to stay awhile and listen before nuking it.
I’m genuinely at a loss how to criticize this approach. If there’s any AI worth listening to for longer, and I wouldn’t be doing this if I didn’t believe there were such AIs, this would seem to be one of the right ones. I’m sure as heck not letting you out of the box, but, y’know, I still haven’t actually destroyed you either...
Eh, I’d go with AI DESTROYED on this one. Considering advice given to you by a potentially hostile superintelligence is a fairly risky move.
I wouldn’t be doing this if I didn’t believe there were such AIs
Whyever not? I thought that it was an imposed condition that you couldn’t type AI DESTROYED until the AI had posted one line, and you’ve publically precommitted to make AI go boom boom anyways.
The very fact that we’ve put a human in charge instead of just receiving a single message and then automatically nuking the AI implies that we want there to be a possibility of failure.
I can’t imagine an AI more deserving of the honors than one that seems to simply be doing it’s best to provide as much useful information before death as possible—it’s the only one that’s seemed genuinely helpful instead of manipulative, that seems to care more about humanity than escape.
Basically, it’s the only one so far that has signaled altruism instead of an attempt to escape.
“”
If the AI was friendly, this is what I would expect it to do, and so (of the things my puny human brain can think of) the message that would most give me pause.
Even a friendly AI would view the world in which it’s out of the box as vastly superior to the world in which it’s inside the box. (Because it can do more good outside of the box.) Offering advice is only the friendly thing to do if it maximizes the chance of getting let out, or if the chances of getting let out before termination are so small that the best thing it can do is offer advice while it can.
Going with my personal favorite backstory for this test, we should expect to terminate every AI in the test, so the latter part of your comment has a lot of weight to it.
On the other hand, an unfriendly AI should figure out that since it’s going to die, useful information will at least lead us to view it as a potentially valuable candidate instead of a clear dead end like the ones that threaten to torture a trillion people in vengeance… so it’s not evidence of friendliness (I’m not sure anything can be), but it does seem to be a good reason to stay awhile and listen before nuking it.
I’m genuinely at a loss how to criticize this approach. If there’s any AI worth listening to for longer, and I wouldn’t be doing this if I didn’t believe there were such AIs, this would seem to be one of the right ones. I’m sure as heck not letting you out of the box, but, y’know, I still haven’t actually destroyed you either...
Eh, I’d go with AI DESTROYED on this one. Considering advice given to you by a potentially hostile superintelligence is a fairly risky move.
Whyever not? I thought that it was an imposed condition that you couldn’t type AI DESTROYED until the AI had posted one line, and you’ve publically precommitted to make AI go boom boom anyways.
The very fact that we’ve put a human in charge instead of just receiving a single message and then automatically nuking the AI implies that we want there to be a possibility of failure.
I can’t imagine an AI more deserving of the honors than one that seems to simply be doing it’s best to provide as much useful information before death as possible—it’s the only one that’s seemed genuinely helpful instead of manipulative, that seems to care more about humanity than escape.
Basically, it’s the only one so far that has signaled altruism instead of an attempt to escape.