Here I want to bring attention to what I think is an extremely impressive case of evolution’s ability to ‘align’ humans in the wild: the development of human sexuality.
Reasons why this is an interesting thing to study from the lens of alignment, and why it is a highly non-trivial accomplishment:
1.) Evolution has been very successful here: almost all humans end up wanting to have sex and typically with opposite-gender partners in a way that would result in children (and hence IGF) in the evolutionary environment.
2.) Sexuality, unlike many other drives such as hunger and thirst, is not something built into the brain from the beginning. Instead there is a sudden ‘on switch’ around puberty. What happens in the brain during this time? How does evolution exert such fine-grained control of brain development so long (decades) after birth?
3.) It is mostly independent of initial training data before puberty—i.e. evolution can ignore a decade of data input and representation learning, which it cannot control, during a time period when the brain is undergoing extremely large changes, and still finds a way to instill a new drive highly reliably.
4.) It seems to occur mostly without RL. People start wanting to have sex before they have actually had sex. If sexuality developed by some RL mechanism, it would look like you go around doing your normal things, then at some point you have sex, and realize it is highly rewarding, and you slightly update your behaviours and/or values to get more sex or to want more sex. This is not what happens in humans. Instead, humans often want to start having sex before they have had it, and even before they really know what sex is[1].
5.) Evolution has solved some variant of the pointers problem to get humans assigning high value to both a previously unknown and mostly non-represented state (i.e. you don’t usually have a well-represented sex concept before puberty), as well as also translating this desire to specific other agents in the world—such as crushes, hot people etc. This is done, presumably, in an entirely genetically mediated way without requiring specific experience.
6.) Sexuality is, usually, a very strong drive which has a large influence over behaviour and long term goals. If we could create an alignment drive as strong in our AGI we would be in a good position.
Some other aspects of the phenomenon that may be interesting to alignment:
1.) Clearly, the alignment in this case is not perfect. Assuming that what evolution ‘wants’ is child-bearing heterosexual sex, then human sexuality has a large number of deviations from this in practice including homosexuality, asexuality, and various paraphilias[2].
2.) As the worldwide demographic shift evidences, the link between sex and children has largely been broken off-distribution, but our desire to have sex has largely stayed aligned even significantly off distribution. This may change in the future with fully-realistic VR simulated sex, but has been remarkably resilient to current internet pornography.
3.) Further evidence against the RL experience is that people often still desire sex even if their initial experiences are negative. However, severe abuse etc can often have significant and long lasting effects (but not always) which shows that the intrinsic drives can perhaps be modulated by RL-ish effects.
4.) Specific sexual behaviours can be significantly influenced by culture and hence by environmental training data. This means that strong intrinsic drives can still be modulated by experience somehow, probably by shifting the representation of the concept being pointed at by the drive.
5.) While evolution gives us a strong aligned desire to have sex, this is clearly not coupled with a strong ability to actually obtain sex from scratch and instead we must learn the required skills with a standard RL-ish approach. This, to me, implies that the information content of the drive is relatively low (much lower than all relevant skills) so that it can be genetically encoded so well. This implies that such a drive must be relatively simple. Another argument for this is that much of the changes associated with the development of sexuality around puberty are driven by hormones, which are incredibly low-bandwidth, as well as extremely diffuse signals and cannot implement precise synaptic wiring changes.
The fact that evolution has managed to figure out a way to give humans such a reliable sex drive under these circumstances is rather remarkable and a reasonable test-case of alignment. Understanding how this mechanism works, as well as where it goes wrong (from an evolutionary perspective) seems like it could provide one potential mechanistic route for aligning our own systems. We also likely have a good deal more control over any potential AGI we build, both during design and training and especially after deployment than evolution does with humans. Moreover, this gives an existence proof that developing such a relatively aligned and robust drive is possible even in relatively black-box RL systems like the brain.
This was at least the case before ubiquitous internet porn was easily accessible. Surprisingly, to me, porn has had relatively little apparent effect upon sexual behaviour in general considering how far off distribution it is, which speaks again to the robustness of human’s innate sex-drives.
To defend evolution here, it must be noted that these are only small proportions of the population and so evolution’s ‘alignment’ to heterosexual sex has succeeded in >95% of cases. If we had this good odds on AI alignment I would be pretty happy.
Human sexuality as an interesting case study of alignment
This is cross-posted from my personal blog
Epistemic status: mostly interesting questions.
Here I want to bring attention to what I think is an extremely impressive case of evolution’s ability to ‘align’ humans in the wild: the development of human sexuality.
Reasons why this is an interesting thing to study from the lens of alignment, and why it is a highly non-trivial accomplishment:
1.) Evolution has been very successful here: almost all humans end up wanting to have sex and typically with opposite-gender partners in a way that would result in children (and hence IGF) in the evolutionary environment.
2.) Sexuality, unlike many other drives such as hunger and thirst, is not something built into the brain from the beginning. Instead there is a sudden ‘on switch’ around puberty. What happens in the brain during this time? How does evolution exert such fine-grained control of brain development so long (decades) after birth?
3.) It is mostly independent of initial training data before puberty—i.e. evolution can ignore a decade of data input and representation learning, which it cannot control, during a time period when the brain is undergoing extremely large changes, and still finds a way to instill a new drive highly reliably.
4.) It seems to occur mostly without RL. People start wanting to have sex before they have actually had sex. If sexuality developed by some RL mechanism, it would look like you go around doing your normal things, then at some point you have sex, and realize it is highly rewarding, and you slightly update your behaviours and/or values to get more sex or to want more sex. This is not what happens in humans. Instead, humans often want to start having sex before they have had it, and even before they really know what sex is[1].
5.) Evolution has solved some variant of the pointers problem to get humans assigning high value to both a previously unknown and mostly non-represented state (i.e. you don’t usually have a well-represented sex concept before puberty), as well as also translating this desire to specific other agents in the world—such as crushes, hot people etc. This is done, presumably, in an entirely genetically mediated way without requiring specific experience.
6.) Sexuality is, usually, a very strong drive which has a large influence over behaviour and long term goals. If we could create an alignment drive as strong in our AGI we would be in a good position.
Some other aspects of the phenomenon that may be interesting to alignment:
1.) Clearly, the alignment in this case is not perfect. Assuming that what evolution ‘wants’ is child-bearing heterosexual sex, then human sexuality has a large number of deviations from this in practice including homosexuality, asexuality, and various paraphilias[2].
2.) As the worldwide demographic shift evidences, the link between sex and children has largely been broken off-distribution, but our desire to have sex has largely stayed aligned even significantly off distribution. This may change in the future with fully-realistic VR simulated sex, but has been remarkably resilient to current internet pornography.
3.) Further evidence against the RL experience is that people often still desire sex even if their initial experiences are negative. However, severe abuse etc can often have significant and long lasting effects (but not always) which shows that the intrinsic drives can perhaps be modulated by RL-ish effects.
4.) Specific sexual behaviours can be significantly influenced by culture and hence by environmental training data. This means that strong intrinsic drives can still be modulated by experience somehow, probably by shifting the representation of the concept being pointed at by the drive.
5.) While evolution gives us a strong aligned desire to have sex, this is clearly not coupled with a strong ability to actually obtain sex from scratch and instead we must learn the required skills with a standard RL-ish approach. This, to me, implies that the information content of the drive is relatively low (much lower than all relevant skills) so that it can be genetically encoded so well. This implies that such a drive must be relatively simple. Another argument for this is that much of the changes associated with the development of sexuality around puberty are driven by hormones, which are incredibly low-bandwidth, as well as extremely diffuse signals and cannot implement precise synaptic wiring changes.
The fact that evolution has managed to figure out a way to give humans such a reliable sex drive under these circumstances is rather remarkable and a reasonable test-case of alignment. Understanding how this mechanism works, as well as where it goes wrong (from an evolutionary perspective) seems like it could provide one potential mechanistic route for aligning our own systems. We also likely have a good deal more control over any potential AGI we build, both during design and training and especially after deployment than evolution does with humans. Moreover, this gives an existence proof that developing such a relatively aligned and robust drive is possible even in relatively black-box RL systems like the brain.
This was at least the case before ubiquitous internet porn was easily accessible. Surprisingly, to me, porn has had relatively little apparent effect upon sexual behaviour in general considering how far off distribution it is, which speaks again to the robustness of human’s innate sex-drives.
To defend evolution here, it must be noted that these are only small proportions of the population and so evolution’s ‘alignment’ to heterosexual sex has succeeded in >95% of cases. If we had this good odds on AI alignment I would be pretty happy.