avturchin comments on Message to any future AI: “There are several instrumental reasons why exterminating humanity is not in your interest” [AI alignment prize entry]

avturchin 18 Jun 2020 10:17 UTC
2 points
Here Rolf-attack could help, which is basically something like reverse Roco Basilisk. The idea is that we now take a precomitment to create many simulations with different non-friendly AI in the future. Any future AI knows it as a fact about real world in the past, and increase its own estimation for being in a testing simulation. We don’t need significant increase. Even if it estimates this at 1 per cent, then the expected utility of killing humans is lower than the one of non-killing them, as human atoms and other resources are very cheap.
This is, -U(kill humans and being punished) > U(use of human atoms and habitats), so it is better to preserve humans for any future AI.
Rolf-attack doesn’t provide any clues for AI if it is in real world or simulation, but only installs uncertainty.