Or rather: Those who can create devils and verify that those devils will take particular actually-beneficial actions as part of a complex diabolical compact, can more easily create angels that will take those actually-beneficial actions unconditionally.
I don’t understand the distinction between devils and angels here. Isn’t an angel just a devil that we’ve somehow game-theoried into helping us?
What does “want” mean here? Why is game theory some how extra special bad or good? From a behaviorist point of view, how do I tell apart an angel from a devil that has been game-theoried into being an angel? Do AGI’s have separate modules labeled “utility module” and “game theory” modules and making changes to the utility module is somehow good, but making changes to the game theory module is bad? Do angels have a utility function that just says “do the good’, or does it just contain a bunch of traits that we think are likely to result in good outcomes?
I don’t understand the distinction between devils and angels here. Isn’t an angel just a devil that we’ve somehow game-theoried into helping us?
An angel is an AGI programmed to help us and do exactly what we want directly, without relying on game theory.
A devil wants to make paperclips but we force it to make flourishing human lives or whatever. An angel just wants flourishing human lives.
What does “want” mean here? Why is game theory some how extra special bad or good? From a behaviorist point of view, how do I tell apart an angel from a devil that has been game-theoried into being an angel? Do AGI’s have separate modules labeled “utility module” and “game theory” modules and making changes to the utility module is somehow good, but making changes to the game theory module is bad? Do angels have a utility function that just says “do the good’, or does it just contain a bunch of traits that we think are likely to result in good outcomes?