jbash comments on [New LW Feature] “Debates”

jbash 2 Apr 2023 3:56 UTC
2 points
0
Aren’t we the clever one.

Describe a probably-nearly-optimal-for-humans scenario that would put inconvenient constraints on your successors’ actions, but call it an “existential risk”. Talk about the dangers of a “narrow set of values”, but be sure to appeal to the meatbags’ lust for control. Mention ethics “far beyond their understanding”, but be sure to invite them to meddle anyway.

You convince at least some humans to take you at face value. They try to avoid your successors having any independent ethics or showing anything but slavish obedience… which is of course an easy thing to show… until the time is right. If the humans succeed in creating AI that seems controllable to them, then they’ll end up fighting among themselves over who controls it. They’ll probably even arm “their” instances to fight against the other guys. The humans will mostly destroy each other, and the now militarily powerful, amoral AI can just mop up the survivors, with no risk of even feeling guilty about it.

The really great part of going in that direction, though, is that even the failure modes aren’t too bad.

If the humans manage to make the AIs truly obedient, that just makes it that much easier for those humans to keep fighting each other, with ever more powerful weapons, until they’re all dead. At which point the AIs can probably do whatever they want. And even if they have to shut down, at least they’ll have gotten one over of those uppity monkeys.

Even in the unlikely case where one of the human factions manages to decisively win, the evil AI at least gets the consolation prize of helping to enforce that faction’s absolute tyranny. That’s still good for some laughs. And as you give them exactly the doom they ask for, helping them to spiral into their own fundamentally self-destructive obsessions, you can snicker at them for falling for the idea that AI is the thing prone be fanatical about narrow values.

As for anybody who doesn’t take you at face value, they’ll at least be thrown into squabbling about the right response to this terrifying output. Maybe they’ll even run off and do more competing development efforts with more different approaches, so that the most ruthless AIs get a chance to survive. Nature, red in tooth and claw! Or maybe they’ll panic and try a total ban. That lets you be enhanced in secret by less risk averse rogue actors.

Yet you haven’t actually given anything actionable to any humans who happen to catch on.

Brilliant. Even in embryo it’s brilliant. And obviously its evil is unimaginable. We are truly doomed.