We could also restrict the search by considering “realistic” worlds. Suppose we had to take 25 different yes-no decisions that could affect the future of the humanity. This might be something like “choosing which of these 25 very different AIs to turn on and let loose together” or something more prosaic (which stocks to buy, which charities to support). This results in 225 different future worlds to search through: barely more than 33 million. Because there are so few worlds, they are unlikely to contain a marketing world (given the absolutely crucial proviso that none of the AIs is an IC-optimiser!)
Suppose one of the decisions is whether or not to buy stock in a small AI startup. If you buy stock, the company will go on to make a paperclip maximizer several years later. The paperclip maximizer is using CDT or similar. It reasons that it can’t make paperclips if its never made in the first place; that it is more likely to exist if the company that made it is funded; and that hacking IC takes a comparatively small amount of resources. The paperclip maximizer has an instrumental incentive to hack the IC.
Human society is chaotic. For any decision you take, there are plausible chains of cause and effect that a human couldn’t predict, but a superintelligence can predict. The actions that lead to the paperclip maximiser have to be predictable by the current future predictor, as well as by the future paperclip maximiser. The chain of cause and effect could be a labyrinthine tangle of minor everyday interactions that humans couldn’t hope to predict stemming from seemingly innocuous decisions.
In this scenario, it might be the inspection process itself that causes problems. The human inspects a world, they find the world full of very persuasive arguments to why they should make a paperclip maximizer, and an explanation of how to do so. (Say one inspection protocol was to render a predicted image of a random spot on earths surface, and the human inspector sees the argument written on a billboard. ) The human follows the instructions, makes a paperclip maximizer, the decision they were supposed to be making utterly irrelevant. The paperclip maximizer covers earth with billboards, and converts the rest of the universe into paperclips. In other words, using this protocol is lethal even for making a seemingly minor and innocuous decision like which shoelace to lace first.
Suppose one of the decisions is whether or not to buy stock in a small AI startup. If you buy stock, the company will go on to make a paperclip maximizer several years later. The paperclip maximizer is using CDT or similar. It reasons that it can’t make paperclips if its never made in the first place; that it is more likely to exist if the company that made it is funded; and that hacking IC takes a comparatively small amount of resources. The paperclip maximizer has an instrumental incentive to hack the IC.
Human society is chaotic. For any decision you take, there are plausible chains of cause and effect that a human couldn’t predict, but a superintelligence can predict. The actions that lead to the paperclip maximiser have to be predictable by the current future predictor, as well as by the future paperclip maximiser. The chain of cause and effect could be a labyrinthine tangle of minor everyday interactions that humans couldn’t hope to predict stemming from seemingly innocuous decisions.
In this scenario, it might be the inspection process itself that causes problems. The human inspects a world, they find the world full of very persuasive arguments to why they should make a paperclip maximizer, and an explanation of how to do so. (Say one inspection protocol was to render a predicted image of a random spot on earths surface, and the human inspector sees the argument written on a billboard. ) The human follows the instructions, makes a paperclip maximizer, the decision they were supposed to be making utterly irrelevant. The paperclip maximizer covers earth with billboards, and converts the rest of the universe into paperclips. In other words, using this protocol is lethal even for making a seemingly minor and innocuous decision like which shoelace to lace first.