I completely agree that our behaviour doesn’t maximise the outer goal. My mysteriously capitalised “Pretty Good” was intended to point in this direction—that I find it interesting that we still have some kids, even when we could have none and still have sex and do other fun things. Declining populations would also point to worse alignment. I would consider proper bad alignment to be no kids at all, or the destruction of the planet and our human race along with it, although my phrasing, and thinking on this, is quite vague.
There is an element of unsustainability in your strategy for max gene spreading, where if everyone was constantly doing everything they could to try to spread their genes as much as possible, in the ways you describe, humanity as a whole might not survive, spreading no genes at all. But, even if it would unsustainable for everyone to do the things you described, a few more people could do it, spread their genes far and wide and society would keep ticking along. Or everyone could have just a few more children and things would probably be fine in the long term. I would say that men getting very little satisfaction from sperm donation is a case of misalignment—a deep mismatch between our “training” ancestral environments and our “deployment” modern world.
So I agree we don’t maximise the outer goal, especially now that know how not to. One of the things that made me curious about this whole thing is that this characteristic, some sort of robust goal following without maximising, seems like something we would desire in artificial agents. Reading through all these comments is crystallising in my head what my questions on this topic actually are:
Is this robust non-maximalness an emerging quality of some or all very smart agents? - I doubt it, but it would be nice as it would reduce the chances that we get turned into paperclips.
Do we know how to create agents that exhibit these characteristics I think are positive? - I doubt it, but might be worth figuring out. An AGI that follows their goals only some sustainable, reasonable, amount seems safer than the AGI equivalent of the habitual sperm donor.
Is this robust non-maximalness an emerging quality of some or all very smart agents?
Yeah, I suspect it’s actually pretty hard to get a mesa-optimizer which maximizes some simple, internally represented utility function. I am seriously considering a mechanistic hypothesis where “robust non-maximalness” is the default. That, on its own, does not guarantee safety, but I think it’s pretty interesting.
I completely agree that our behaviour doesn’t maximise the outer goal. My mysteriously capitalised “Pretty Good” was intended to point in this direction—that I find it interesting that we still have some kids, even when we could have none and still have sex and do other fun things. Declining populations would also point to worse alignment. I would consider proper bad alignment to be no kids at all, or the destruction of the planet and our human race along with it, although my phrasing, and thinking on this, is quite vague.
There is an element of unsustainability in your strategy for max gene spreading, where if everyone was constantly doing everything they could to try to spread their genes as much as possible, in the ways you describe, humanity as a whole might not survive, spreading no genes at all. But, even if it would unsustainable for everyone to do the things you described, a few more people could do it, spread their genes far and wide and society would keep ticking along. Or everyone could have just a few more children and things would probably be fine in the long term. I would say that men getting very little satisfaction from sperm donation is a case of misalignment—a deep mismatch between our “training” ancestral environments and our “deployment” modern world.
So I agree we don’t maximise the outer goal, especially now that know how not to. One of the things that made me curious about this whole thing is that this characteristic, some sort of robust goal following without maximising, seems like something we would desire in artificial agents. Reading through all these comments is crystallising in my head what my questions on this topic actually are:
Is this robust non-maximalness an emerging quality of some or all very smart agents? - I doubt it, but it would be nice as it would reduce the chances that we get turned into paperclips.
Do we know how to create agents that exhibit these characteristics I think are positive? - I doubt it, but might be worth figuring out. An AGI that follows their goals only some sustainable, reasonable, amount seems safer than the AGI equivalent of the habitual sperm donor.
Yeah, I suspect it’s actually pretty hard to get a mesa-optimizer which maximizes some simple, internally represented utility function. I am seriously considering a mechanistic hypothesis where “robust non-maximalness” is the default. That, on its own, does not guarantee safety, but I think it’s pretty interesting.