If natural selection had been a foresightful, intelligent kind of engineer that was able to engineer things successfully, it would have built us to be revolted by the thought of condoms
On the one hand, sperm banks aren’t very popular, and they “should” be, according to the “humans are fitness maximizers” model. People do eat more ice cream than is good for them, and “Shallowly following drives and not getting to the original goal that put them there” is definitely a thing that happens a lot.
On the other hand, this shallow model misses a lot. Prostitution may be more popular than sperm donation, but for how powerful our drive for sex is, it doesn’t add up. Condoms do reduce physical sensation somewhat, but not enough to explain the visceral revulsion that so many men (and some women) do have regarding their use—it’s just that sometimes the desire for “sex” is stronger. In order to make these things match, you have to model sex with condoms and especially sex with prostitutes as not really counting as “sex” in the way that is desired. And then you get the interesting questions of “Then what does count, and why?”.
In a fairly literal sense, for every example you can show me of a person chasing sex in a way that doesn’t fit with “fitness maximizing”, I can show you an example of someone chasing sex in a way that doesn’t fit with “shallow impulse chasing”. Sometimes it’s fairly blatant, like “Oh, they “thought” she was infertile so they “weren’t being very careful”, interesting” (I know 3 kids conceived in this way), but other times it’s more subtle like “She is on hormonal birth control, is insisting on condom use as well, and is adamant about not wanting a kid right now” and genuinely believes this and is right in that her CEV points that way… and yet an actual exploration of her desires will find that when faced with an artificial “would you rather” separating what she thought she wanted from the things that lead towards pregnancy, you get 100% of the desire in the “surprising” direction.
It’s not that we’re aligned towards “shallow things” that are “the wrong thing” from evolutions perspective, it’s that we’re just not that aligned, period. We’re incoherent. And it’s not that we can’t learn to align to whatever our coherent extrapolated volition turns out to be, it’s that we haven’t—not completely. It’s a lot of work to figure out wtf we actually want, so building global coherence is slow. I recognize that I haven’t done the necessary work to justify it here, and that the claim is quite controversial, but when you actually give people the experiences they need in order to see that their shallow desires aren’t meeting their deeper goals, preferences change. People stop eating so much ice cream, or having any particular interest in sex with condoms, etc. It’s about figuring out how to systematically do that, and doing enough of it.
The use of human failures of alignment as an analogy for AI obviously has its limits, even if it’s not obvious exactly where they are. However, so long as we’re exploring the analogy, things are much more optimistic than “grab randomly from mind design space, and hope it’s Friendly”. In human’s, alignment failure doesn’t mean “Oh no, the process of loading terminal goals went wrong!”, it means that the process of building towards terminal goals got stopped somehow. And the earlier it happens the more of an incoherent mess you have, less able to do anything particularly harmful. In order to make a serial killer you have to get quite a few things right and in order to make a Hitler you have to get even more right, but not completely right or else you wouldn’t have these murderous desires (the incoherences are actually visible when you know how to spot them). Screw up more and you end up a hobo preaching to be the DNA your vegetable is, or if you screw up a little less maybe a petty thief feeding a drug habit.
This is completely separate from the problem of “Wtf happens when you raise an intelligence—human or otherwise—to be so powerful that indifference to the wellbeing of humans cannot get socialized into prosocial desires, and that humans cannot harm it enough to form malevolent desires?”, but it’s still worth noting and understanding.
This bit got me to laugh out loud. Who’s ever heard a man complain about having to use a condom?
On the one hand, sperm banks aren’t very popular, and they “should” be, according to the “humans are fitness maximizers” model. People do eat more ice cream than is good for them, and “Shallowly following drives and not getting to the original goal that put them there” is definitely a thing that happens a lot.
On the other hand, this shallow model misses a lot. Prostitution may be more popular than sperm donation, but for how powerful our drive for sex is, it doesn’t add up. Condoms do reduce physical sensation somewhat, but not enough to explain the visceral revulsion that so many men (and some women) do have regarding their use—it’s just that sometimes the desire for “sex” is stronger. In order to make these things match, you have to model sex with condoms and especially sex with prostitutes as not really counting as “sex” in the way that is desired. And then you get the interesting questions of “Then what does count, and why?”.
In a fairly literal sense, for every example you can show me of a person chasing sex in a way that doesn’t fit with “fitness maximizing”, I can show you an example of someone chasing sex in a way that doesn’t fit with “shallow impulse chasing”. Sometimes it’s fairly blatant, like “Oh, they “thought” she was infertile so they “weren’t being very careful”, interesting” (I know 3 kids conceived in this way), but other times it’s more subtle like “She is on hormonal birth control, is insisting on condom use as well, and is adamant about not wanting a kid right now” and genuinely believes this and is right in that her CEV points that way… and yet an actual exploration of her desires will find that when faced with an artificial “would you rather” separating what she thought she wanted from the things that lead towards pregnancy, you get 100% of the desire in the “surprising” direction.
It’s not that we’re aligned towards “shallow things” that are “the wrong thing” from evolutions perspective, it’s that we’re just not that aligned, period. We’re incoherent. And it’s not that we can’t learn to align to whatever our coherent extrapolated volition turns out to be, it’s that we haven’t—not completely. It’s a lot of work to figure out wtf we actually want, so building global coherence is slow. I recognize that I haven’t done the necessary work to justify it here, and that the claim is quite controversial, but when you actually give people the experiences they need in order to see that their shallow desires aren’t meeting their deeper goals, preferences change. People stop eating so much ice cream, or having any particular interest in sex with condoms, etc. It’s about figuring out how to systematically do that, and doing enough of it.
The use of human failures of alignment as an analogy for AI obviously has its limits, even if it’s not obvious exactly where they are. However, so long as we’re exploring the analogy, things are much more optimistic than “grab randomly from mind design space, and hope it’s Friendly”. In human’s, alignment failure doesn’t mean “Oh no, the process of loading terminal goals went wrong!”, it means that the process of building towards terminal goals got stopped somehow. And the earlier it happens the more of an incoherent mess you have, less able to do anything particularly harmful. In order to make a serial killer you have to get quite a few things right and in order to make a Hitler you have to get even more right, but not completely right or else you wouldn’t have these murderous desires (the incoherences are actually visible when you know how to spot them). Screw up more and you end up a hobo preaching to be the DNA your vegetable is, or if you screw up a little less maybe a petty thief feeding a drug habit.
This is completely separate from the problem of “Wtf happens when you raise an intelligence—human or otherwise—to be so powerful that indifference to the wellbeing of humans cannot get socialized into prosocial desires, and that humans cannot harm it enough to form malevolent desires?”, but it’s still worth noting and understanding.