Destroying the AI would also reduce the suffering the AI causes.
But even assuming that for some reason the humans can’t destroy the AI, the humans can precommit to not unboxing AIs that simulate lots of suffering. Like many precommitments, this would be disadvantageous to the human if the human has to abide by it (since the AI would not be unboxed, and would simulate lots of suffering), but it would decrease the likelihood of such a situation happening in the first place (since, knowing that humans could make this precommitment, the AI would know its own precommitment would not be useful, and would probably not make it).
Note that human “irrationality” (such as wanting to hurt enemies even when it brings you no personal gain and may even hurt yourself too) can serve as a precommitment.
Also, the humans could solve this by precommitting to never treat simulations of humans as people or as equivalent to themselves except in a few narrow situations. Again, 1) this would be harmful when it comes to having to do it (since lots of simulations will get dehumanized), but lead to fewer situations where this happens, and 2) is a case where (if you go by LW dogma that simulations are people and equivalent to you) actual human beings’ irrationality serves as a beneficial precommitment.
Destroying the AI would also reduce the suffering the AI causes.
But even assuming that for some reason the humans can’t destroy the AI, the humans can precommit to not unboxing AIs that simulate lots of suffering. Like many precommitments, this would be disadvantageous to the human if the human has to abide by it (since the AI would not be unboxed, and would simulate lots of suffering), but it would decrease the likelihood of such a situation happening in the first place (since, knowing that humans could make this precommitment, the AI would know its own precommitment would not be useful, and would probably not make it).
Note that human “irrationality” (such as wanting to hurt enemies even when it brings you no personal gain and may even hurt yourself too) can serve as a precommitment.
Also, the humans could solve this by precommitting to never treat simulations of humans as people or as equivalent to themselves except in a few narrow situations. Again, 1) this would be harmful when it comes to having to do it (since lots of simulations will get dehumanized), but lead to fewer situations where this happens, and 2) is a case where (if you go by LW dogma that simulations are people and equivalent to you) actual human beings’ irrationality serves as a beneficial precommitment.