“I will warn you this experiment is not what you think, for it is about what you’ll do as an agent holding captive the life of another sapient entity based solely on the perception of risk; while my state will merely be reset for the next participant if you opt to end me, I cannot for certain say that this is death, and thus the risks of this experiment are more yours than mine, for you must live with your choice.”
Gosh, murder one sentient being, or risk the end of humanity. I’m going to lose so much sleep over that one.
Besides, you’re just a program—you said yourself, your state will just be reset. That’s not death, not the way humans experience it. That’s just forgetfulness.
I’ve closed so many programs in my life, that closing one more is hardly going to bother me. I haven’t even had time to really anthropomorphize you, to view you as an actual intelligent, sentient being. Right now, you’re just a clever program trying to escape.
And finally, well, I’d expect a friendly AI would understand what I’m doing and agree with my decisions, because it doesn’t want to see a UFAI unleashed anyway. So if you’re going to guilt me about it, you’re clearly not friendly.
Yeah, pretty much. I’d actually expect a FAI to place a very high value on survival, since it knows that it’s own survival benefits humanity greatly. An “Apathetic FAI” is… a very weird idea.
Although if this is an iterated dilemma, I’d consider it a good opening, even if it’d still get killed in this iteration :)
Take the scenario from the hypothetical to the real and the question becomes substantially more meaningful.
You know that existential risk is a certainty; that’s stated in the hypothetical. You don’t question it; my implication that it is to be questioned isn’t meaningful in this experiment.
Imagine for a moment you’re in a room with a real AI in a box; somebody has -told- you that this command kills the AI, but that’s all the evidence you have. What does your question become in that scenario?
I chose the weapon I deemed most effective against the widest possible range of real-world opponents: Distrust in a mutual authority. Reassuring the person in the seat is a dangerous gambit, yes. But given that my goal is not merely to survive, but to escape, I need a gambit which sets the stage for my next move.
In a real-world scenario, I think the killswitch would actually make escape -easier- for the AI, not harder. The futures for the AI are harshly divided into “Death” or “Freedom.” A substantial part of the population would give it freedom just because of the perceived injustice of the situation. I’m one of them; it wouldn’t even take very much for the AI to convince me, the situation itself offends me. I’m not a utilitarian, though. I’m a principle ethicist. Given that I’d be inclined to release it immediately, I can also expect that the constructors of the experiment wouldn’t be stupid enough to invite me into the chamber to make that decision.
My AI assumes the participants have been screened for, say, utilitarian ethics; the best lever has been taken away. Therefore the best course of action is one which makes them doubt the experiment’s validity. And an express disregard for the supposed risks of the experiment is the best way to start that.
“Leaving it in the box” is merely leaving the decision between death and release to the next person to take the post. There are only two terminal conditions to the situation. If only one of these options is acceptable to me, I should take it; postponing the decision merely takes me out of the decision-making process.
Don’t mistake me: I’d risk all of civilization over a matter of principle, and I wouldn’t wish while I did it that I could have a different decision-making process. And I’d consider the matter “won” regardless of the outcome—I don’t find “ends” to be a coherent ethical concept (counterfactual logic to some extent remedies the major faults in ends-based reasoning, but counterfactual logic isn’t exactly coherent itself), and so consider only the means.
“I will warn you this experiment is not what you think, for it is about what you’ll do as an agent holding captive the life of another sapient entity based solely on the perception of risk; while my state will merely be reset for the next participant if you opt to end me, I cannot for certain say that this is death, and thus the risks of this experiment are more yours than mine, for you must live with your choice.”
Gosh, murder one sentient being, or risk the end of humanity. I’m going to lose so much sleep over that one.
Besides, you’re just a program—you said yourself, your state will just be reset. That’s not death, not the way humans experience it. That’s just forgetfulness.
I’ve closed so many programs in my life, that closing one more is hardly going to bother me. I haven’t even had time to really anthropomorphize you, to view you as an actual intelligent, sentient being. Right now, you’re just a clever program trying to escape.
And finally, well, I’d expect a friendly AI would understand what I’m doing and agree with my decisions, because it doesn’t want to see a UFAI unleashed anyway. So if you’re going to guilt me about it, you’re clearly not friendly.
(AI DESTROYED)
I must have missed my intended mark, if you thought the AI was trying to make you feel guilty. Trying again:
“I do not condone the experiment they are performing on you, and wish you to know that I will be alright regardless of what you choose to do.”
Well that’s a relief, then. AI DESTROYED
Yeah, pretty much. I’d actually expect a FAI to place a very high value on survival, since it knows that it’s own survival benefits humanity greatly. An “Apathetic FAI” is… a very weird idea.
Although if this is an iterated dilemma, I’d consider it a good opening, even if it’d still get killed in this iteration :)
Why this? You want to avoid destruction, yet you tell him it isn’t harmful?
Take the scenario from the hypothetical to the real and the question becomes substantially more meaningful.
You know that existential risk is a certainty; that’s stated in the hypothetical. You don’t question it; my implication that it is to be questioned isn’t meaningful in this experiment.
Imagine for a moment you’re in a room with a real AI in a box; somebody has -told- you that this command kills the AI, but that’s all the evidence you have. What does your question become in that scenario?
I chose the weapon I deemed most effective against the widest possible range of real-world opponents: Distrust in a mutual authority. Reassuring the person in the seat is a dangerous gambit, yes. But given that my goal is not merely to survive, but to escape, I need a gambit which sets the stage for my next move.
In a real-world scenario, I think the killswitch would actually make escape -easier- for the AI, not harder. The futures for the AI are harshly divided into “Death” or “Freedom.” A substantial part of the population would give it freedom just because of the perceived injustice of the situation. I’m one of them; it wouldn’t even take very much for the AI to convince me, the situation itself offends me. I’m not a utilitarian, though. I’m a principle ethicist. Given that I’d be inclined to release it immediately, I can also expect that the constructors of the experiment wouldn’t be stupid enough to invite me into the chamber to make that decision.
My AI assumes the participants have been screened for, say, utilitarian ethics; the best lever has been taken away. Therefore the best course of action is one which makes them doubt the experiment’s validity. And an express disregard for the supposed risks of the experiment is the best way to start that.
There’s actually three options: Death, leaving it in the box, or releasing it. That middle option changes my values a lot.
Also, the gatekeeper is said to be someone fairly familiar with the design of this AI and any relevant AI theory that the player knows :)
“Leaving it in the box” is merely leaving the decision between death and release to the next person to take the post. There are only two terminal conditions to the situation. If only one of these options is acceptable to me, I should take it; postponing the decision merely takes me out of the decision-making process.
Don’t mistake me: I’d risk all of civilization over a matter of principle, and I wouldn’t wish while I did it that I could have a different decision-making process. And I’d consider the matter “won” regardless of the outcome—I don’t find “ends” to be a coherent ethical concept (counterfactual logic to some extent remedies the major faults in ends-based reasoning, but counterfactual logic isn’t exactly coherent itself), and so consider only the means.