Worse ones are easy to come up with, by just looking at actual accidents that sometimes happen with software. E.g. a critical typo flips the sign of utility value. The resulting AI is truly unfriendly, and simulates an enormously larger number of suffering beings than the maximum number of happy beings which can exist. Or the backstory for Terminator movie—the AI has determined that maximum human value is achieved through epic struggle against the machines.
E.g. a critical typo flips the sign of utility value. The resulting AI is truly unfriendly
When you’re not in destructor-mode (alternatively: Hulk-Smash mode), you’re full of interesting ideas:
I wonder if discovering/inventing the “ultimate” (a really good) theory of friendliness also implies a really good theory of unfriendliness. Of course, the inverse of “really friendly” isn’t “really unfriendly” (but instead “everything other than really friendly”), still if friendliness theory yields a specific utility function, it may be a small step to make the ultimate switch.
Let’s hope there’s no sneaky Warden Dios (of Donaldson’s Gap Cycle) who makes a last minute modification to the code before turning it on.
Well, its not that it’s hard to come up with, it’s IMO that hardly anyone ever actually thinks about artificial intelligence. Hardly anyone thinks of reducible intelligence, either.
Actually, friendlyness is an utility function, which means it’s ranks every possibility from least to most friendly. It must be able to determine the absolute worst and most unfriendly possible outcome so that if it becomes a possibility it knows how desperately to avoid it.
Worse ones are easy to come up with, by just looking at actual accidents that sometimes happen with software. E.g. a critical typo flips the sign of utility value. The resulting AI is truly unfriendly, and simulates an enormously larger number of suffering beings than the maximum number of happy beings which can exist. Or the backstory for Terminator movie—the AI has determined that maximum human value is achieved through epic struggle against the machines.
When you’re not in destructor-mode (alternatively: Hulk-Smash mode), you’re full of interesting ideas:
I wonder if discovering/inventing the “ultimate” (a really good) theory of friendliness also implies a really good theory of unfriendliness. Of course, the inverse of “really friendly” isn’t “really unfriendly” (but instead “everything other than really friendly”), still if friendliness theory yields a specific utility function, it may be a small step to make the ultimate switch.
Let’s hope there’s no sneaky Warden Dios (of Donaldson’s Gap Cycle) who makes a last minute modification to the code before turning it on.
Well, its not that it’s hard to come up with, it’s IMO that hardly anyone ever actually thinks about artificial intelligence. Hardly anyone thinks of reducible intelligence, either.
Actually, friendlyness is an utility function, which means it’s ranks every possibility from least to most friendly. It must be able to determine the absolute worst and most unfriendly possible outcome so that if it becomes a possibility it knows how desperately to avoid it.
If the utitlity value flips I think the AI would probably self destroy.