Any level of perverse instantiation in a sufficiently powerful AI is likely to lead to total UFAI; i.e. a full existential catastrophe. Either you get the AI design right so that it doesn’t wirehead itself—or others, against their will—or you don’t. I don’t think there’s much middle ground.
22: The AI, unknown to the programmers, had qualia during its entire childhood, and what the programmers thought of as simple negative feedback corresponded to the qualia of unbearable, unmeliorated suffering. All agents simulated by the AI in its imagination existed as real people (albeit simple ones) with their own qualia, and died when the AI stopped imagining them. The number of agents fleetingly imagined by the AI in its search for social understanding exceeds by a factor of a thousand the total number of humans who have ever lived. Aside from that, everything worked fine.
This scenario always struck me as a (qualified) FAI success. There’s a cost—and it’s large in absolute terms—but the benefits will outweigh it by a huge factor, and indeed by enough orders of magnitude that even a slight increase in the probability of getting pre-empted by a UFAI may be too expensive a price to pay for fixing this kind of bug.
So cases like this—where it only happens until the AI matures sufficiently and then becomes able see that its values make this a bad idea, and stops doing it—aren’t as bad as an actual FAI failure.
Of course, if it’s an actual problem with the AI’s value content, which causes the AI to keep on doing this kind of thing throughout its existence, that may well outweigh any good it ever does. The total cost in this case becomes hard to predict, depending crucially on just how much resources the AI spends on these simulations, and how nasty they are on average.
If—we approach condition C of the simulation argument, that is if there are many more simulated beings than apparently real ones—then—We should Increase our credence accordingly in that we are simulated.
If 63 billion simulated humans and 7 billion apparently real ones exist, we have—via anthropic reasoning—a 90% probability of being simulated. If we then don’t care about mind crime, we would be 90% likely to be judging beings in our reference class to be morally unworthy.
How concerning do you find ‘mind crime’ relative to perverse instantiation, as a failure mode?
Any level of perverse instantiation in a sufficiently powerful AI is likely to lead to total UFAI; i.e. a full existential catastrophe. Either you get the AI design right so that it doesn’t wirehead itself—or others, against their will—or you don’t. I don’t think there’s much middle ground.
OTOH, the relevance of Mind Crime really depends on the volume. The FriendlyAICriticalFailureTable has this instance:
This scenario always struck me as a (qualified) FAI success. There’s a cost—and it’s large in absolute terms—but the benefits will outweigh it by a huge factor, and indeed by enough orders of magnitude that even a slight increase in the probability of getting pre-empted by a UFAI may be too expensive a price to pay for fixing this kind of bug.
So cases like this—where it only happens until the AI matures sufficiently and then becomes able see that its values make this a bad idea, and stops doing it—aren’t as bad as an actual FAI failure.
Of course, if it’s an actual problem with the AI’s value content, which causes the AI to keep on doing this kind of thing throughout its existence, that may well outweigh any good it ever does. The total cost in this case becomes hard to predict, depending crucially on just how much resources the AI spends on these simulations, and how nasty they are on average.
If—we approach condition C of the simulation argument, that is if there are many more simulated beings than apparently real ones—then—We should Increase our credence accordingly in that we are simulated. If 63 billion simulated humans and 7 billion apparently real ones exist, we have—via anthropic reasoning—a 90% probability of being simulated. If we then don’t care about mind crime, we would be 90% likely to be judging beings in our reference class to be morally unworthy.