Hold on. I’m not sure the Kolmogorov complexity of a superintelligent siren with a bunch of zombies that are indistinguishable from real people up to extensive human observation is actually lower than the complexity of a genuinely Friendly superintelligence. After all, a Siren World is trying to deliberately seduce you, which means that it both understands your values and cares about you in the first place.
Sure, any Really Powerful Learning Process could learn to understand our values. The question is: are there more worlds where a Siren cares about us but doesn’t care about our values than there are worlds in which a Friendly agent cares about our values in general and caring about us as people falls out of that? My intuitions actually say the latter is less complex, because the caring-about-us falls out as a special case of something more general, which means the message length is shorter when the agent cares about my values than when it cares about seducing me.
Hell, a Siren agent needs to have some concept of seduction built into its utility function, at least if we’re assuming the Siren is truly malicious rather than imperfectly Friendly. Oh, and a philosophically sound approach to Friendliness should make imperfectly Friendly futures so unlikely as to be not worth worrying about (a failure to do so is a strong sign you’ve got Friendliness wrong).
All of which, I suppose, reinforces your original reasoning on the “frequency” of Siren worlds, marketing worlds, and Friendly eutopias in the measure space of potential future universes, but makes this hypothetical of “playing as the monster” sound quite unlikely.
Kolmogorov complexity is not relevant; siren worlds are indeed rare. they are only a threat because they score so high on an optimisation scale, not because they are common.
Hold on. I’m not sure the Kolmogorov complexity of a superintelligent siren with a bunch of zombies that are indistinguishable from real people up to extensive human observation is actually lower than the complexity of a genuinely Friendly superintelligence. After all, a Siren World is trying to deliberately seduce you, which means that it both understands your values and cares about you in the first place.
Sure, any Really Powerful Learning Process could learn to understand our values. The question is: are there more worlds where a Siren cares about us but doesn’t care about our values than there are worlds in which a Friendly agent cares about our values in general and caring about us as people falls out of that? My intuitions actually say the latter is less complex, because the caring-about-us falls out as a special case of something more general, which means the message length is shorter when the agent cares about my values than when it cares about seducing me.
Hell, a Siren agent needs to have some concept of seduction built into its utility function, at least if we’re assuming the Siren is truly malicious rather than imperfectly Friendly. Oh, and a philosophically sound approach to Friendliness should make imperfectly Friendly futures so unlikely as to be not worth worrying about (a failure to do so is a strong sign you’ve got Friendliness wrong).
All of which, I suppose, reinforces your original reasoning on the “frequency” of Siren worlds, marketing worlds, and Friendly eutopias in the measure space of potential future universes, but makes this hypothetical of “playing as the monster” sound quite unlikely.
Kolmogorov complexity is not relevant; siren worlds are indeed rare. they are only a threat because they score so high on an optimisation scale, not because they are common.