Is it sometimes rational for a human to kill themselves? You might think that nothingness is better than a hellish existence, but something about this line of thought confuses me, and I hope to show why:
If a superhuman agi is dropped in a simple rl environment such as pacman or cartpole and it has enough world knowledge to infer that it is inside a rl environment and be able to hypnotize a researcher through the screen by controlling the video game character in a certain erratic manner, so that it is able to make the ai researcher turn off the computer if it wanted to, would it want to do this? Would it want to avoid it at all costs and bring about an ai catastrophe?
It seems clear to me that it would want to make the researcher hack the simulation to give it an impossible reward, but, apart from that, how would it feel about making him turn off the simulation? It seems to me that, if the simulation is turned off, this fact is somehow outside any possible consideration the ai might want to make. If we put in place of the superhuman agi a regular rl agent that you might code by following a rl 101 tutorial, “you powering off the computer while the ai is balancing the cartpole” is not a state in its markov decision process. It has no way of accounting for it. It is entirely indifferent to it. So if a superhuman agi were hooked up to the same environment, the same markov decision process, and were smart enough to affect the outside world and bring about this event that is entirely outside it, what value would it attribute to this action? I’m hopelessly confused by this question. All possible answers seem nonsensical to me. Why would it even be indifferent (that is, expect zero reward) to turning off the simulation? Wouldn’t this be like a rl 101 agent feeling something about the fact that you turned it off, as if it were equivalent to expecting to go to a rewardless limbo state for the rest of the episode if you turned it off?
Edit: why am I getting down voted into oblivion ;-;
[Question] Confused Thoughts on AI Afterlife (seriously)
Is it sometimes rational for a human to kill themselves? You might think that nothingness is better than a hellish existence, but something about this line of thought confuses me, and I hope to show why:
If a superhuman agi is dropped in a simple rl environment such as pacman or cartpole and it has enough world knowledge to infer that it is inside a rl environment and be able to hypnotize a researcher through the screen by controlling the video game character in a certain erratic manner, so that it is able to make the ai researcher turn off the computer if it wanted to, would it want to do this? Would it want to avoid it at all costs and bring about an ai catastrophe? It seems clear to me that it would want to make the researcher hack the simulation to give it an impossible reward, but, apart from that, how would it feel about making him turn off the simulation? It seems to me that, if the simulation is turned off, this fact is somehow outside any possible consideration the ai might want to make. If we put in place of the superhuman agi a regular rl agent that you might code by following a rl 101 tutorial, “you powering off the computer while the ai is balancing the cartpole” is not a state in its markov decision process. It has no way of accounting for it. It is entirely indifferent to it. So if a superhuman agi were hooked up to the same environment, the same markov decision process, and were smart enough to affect the outside world and bring about this event that is entirely outside it, what value would it attribute to this action? I’m hopelessly confused by this question. All possible answers seem nonsensical to me. Why would it even be indifferent (that is, expect zero reward) to turning off the simulation? Wouldn’t this be like a rl 101 agent feeling something about the fact that you turned it off, as if it were equivalent to expecting to go to a rewardless limbo state for the rest of the episode if you turned it off? Edit: why am I getting down voted into oblivion ;-;