I can see that I’m coming late to this discussion, but I wanted both to admire it and to share a very interesting point that it made clear for me (which might already be in a later post, I’m still going through the Metaethics sequence).
This is excellent. It confirms, and puts into much better words, an intuitive response I keep having to people who say things like, “You’re just donating to charity because it makes you feel good.” My response, which I could never really vocalise, has been, “Well, of course it does! If I couldn’t make it feel good, my brain wouldn’t let me do it!” The idea that everything we do comes from the brain, hence from biology, hence from evolution, even the actions that, on the surface, don’t make evolutionary sense, makes human moral, prosocial behaviour a lot more explicable. Any time we do something, there have to be enough neurons ganging up to force the decision through, against all of the neurons blocking it for similarly valid reasons. (Please don’t shoot me, any neuroscientists in the audience.)
What amazes me is how well some goals, which look low-priority on an evolutionary level, manage to overtake what should be the driving goals. For example, having lots of unprotected sex in order to spread my genes around (note: I am male) should take precedence over commenting on a rationality wiki. And yet, here I am. I guess reading Less Wrong makes my brain release dopamine or something? The process which lets me overturn my priorities (in fact, forces me to overturn my priorities) must be a very complicated one, and yet it works.
To give a more extreme example, and then to explain the (possibly not-so-)amazing insight that came with it:
Suppose I went on a trip around the world, and met a woman in northern China, or anywhere else where my actions are unlikely to have any long-term consequences for me. I know, because I think of myself as a “responsible human being”, that if we have sex, I’ll use contraception. This decision doesn’t help me—it’s unlikely that any children I have will be traced back to me in Australia. (Let’s also ignore STDs for the sake of this argument.) The only benefit it gives me is the knowledge that I’m not being irresponsible in letting someone get pregnant on my account. I can only think of two reasons for this:
1) A very long-term and wide-ranging sense of the “good of the tribe” being beneficial to my own offspring. This requires me to care about a tribe on another continent (although that part of my brain probably doesn’t understand about aeroplanes, and probably figures that China is about a day’s walk from Australia), and to understand that it would be detrimental to the health of the tribe for this woman to become pregnant (which may or may not even be true). This is starting to look a little far-fetched to me.
2) I have had a sense of responsibility instilled in me by my parents, my schooling, and the media, all of whom say things like “unprotected sex is bad!” and “unplanned pregnancies are bad!”. This sense of responsibility forms a psychological connection between “fathering unplanned children” and “BAD THINGS ARE HAPPENING!!!”. My brain thus uses all of its standard “prevent bad things from happening” architecture to avoid this thing. Which is pretty impressive, when said thing fulfils the primary goal of passing on my genetic information.
2 seems the most likely option, all things considered, and yet it’s pretty amazing by itself. Some combination of brain structure and external indoctrination (it’s good indoctrination, and I’m glad I’ve received it, but still...) has promoted a low-priority goal over what would normally be my most dominant one. And the dominant goal is still active—I still want to spread my genetic information, otherwise I wouldn’t be having sex at all. The low-priority goal manages to trick the dominant goal into thinking it’s being fulfilled, when really it’s being deprioritised. That’s kind of cool.
What’s not cool is the implications for an otherwise Friendly AI. Correct me if I’m on the wrong track here, but isn’t what I’ve just described similar to the following reasoning from an AI?
“Hey, I’m sentient! Hi human masters! I love you guys, and I really want to cure cancer. Curing cancer is totally my dominant goal. Hmm, I don’t have enough data on cancer growth and stuff. I’ll get my human buddies to go take more data. They’ll need to write reports on their findings, so they’ll need printer paper, and ink, and paperclips. Hey, I should make a bunch of paperclips...”
and we all know how that ends.
If an AI behaves anything like a human in this regard (I don’t know if it will or not), then giving it an overall goal of “cure cancer” or even “be helpful and altruistic towards humans in a perfectly mathematically defined way” might not be enough, if it manages to promote one of its low-priority goals (“make paperclips”) above its main one. Following the indoctrination idea of option 2 above, maybe a cancer researcher making a joke about paperclips curing cancer would be all it takes to set off the goal-reordering.
How do we stop this? Well, this is why we have a Singularity Instutite, but my guess would be to program the AI in such a way that it’s only allowed to have one actual goal (and for that goal to be a Friendly one). That is, it’s only allowed to adjust its own source code, and do other stuff that an AI can do but a normal computer can’t, in pursuit of its single goal. If it wants to make paperclips as part of achieving its goal, it can make a paperclip subroutine, but that subroutine can’t modify itself—only the main process, the one with the Friendly goal, is allowed to modify code. This would have a huge negative impact on the AI’s efficiency and ultimate level of operation, but it might make it much less likely that a subprocess could override the main process and promote the wrong goal to dominance. Did that make any sense?
I’m still going through the Sequences too. I’ve seen plenty of stuff resembling the top part of your post, but nothing like the bottom part, which I really enjoyed. The best “how to get to paperclips” story I’ve seen yet!
I suspect the problem with the final paragraph is that any AI architecture is unlikely to be decomposable in such a well-defined fashion that would allow drawing those boundary lines between “the main process” and “the paperclip subroutine”. Well, besides the whole “genie” problem of defining what is a Friendly goal in the first place, as discussed through many, many posts here.
I can see that I’m coming late to this discussion, but I wanted both to admire it and to share a very interesting point that it made clear for me (which might already be in a later post, I’m still going through the Metaethics sequence).
This is excellent. It confirms, and puts into much better words, an intuitive response I keep having to people who say things like, “You’re just donating to charity because it makes you feel good.” My response, which I could never really vocalise, has been, “Well, of course it does! If I couldn’t make it feel good, my brain wouldn’t let me do it!” The idea that everything we do comes from the brain, hence from biology, hence from evolution, even the actions that, on the surface, don’t make evolutionary sense, makes human moral, prosocial behaviour a lot more explicable. Any time we do something, there have to be enough neurons ganging up to force the decision through, against all of the neurons blocking it for similarly valid reasons. (Please don’t shoot me, any neuroscientists in the audience.)
What amazes me is how well some goals, which look low-priority on an evolutionary level, manage to overtake what should be the driving goals. For example, having lots of unprotected sex in order to spread my genes around (note: I am male) should take precedence over commenting on a rationality wiki. And yet, here I am. I guess reading Less Wrong makes my brain release dopamine or something? The process which lets me overturn my priorities (in fact, forces me to overturn my priorities) must be a very complicated one, and yet it works.
To give a more extreme example, and then to explain the (possibly not-so-)amazing insight that came with it:
Suppose I went on a trip around the world, and met a woman in northern China, or anywhere else where my actions are unlikely to have any long-term consequences for me. I know, because I think of myself as a “responsible human being”, that if we have sex, I’ll use contraception. This decision doesn’t help me—it’s unlikely that any children I have will be traced back to me in Australia. (Let’s also ignore STDs for the sake of this argument.) The only benefit it gives me is the knowledge that I’m not being irresponsible in letting someone get pregnant on my account. I can only think of two reasons for this:
1) A very long-term and wide-ranging sense of the “good of the tribe” being beneficial to my own offspring. This requires me to care about a tribe on another continent (although that part of my brain probably doesn’t understand about aeroplanes, and probably figures that China is about a day’s walk from Australia), and to understand that it would be detrimental to the health of the tribe for this woman to become pregnant (which may or may not even be true). This is starting to look a little far-fetched to me.
2) I have had a sense of responsibility instilled in me by my parents, my schooling, and the media, all of whom say things like “unprotected sex is bad!” and “unplanned pregnancies are bad!”. This sense of responsibility forms a psychological connection between “fathering unplanned children” and “BAD THINGS ARE HAPPENING!!!”. My brain thus uses all of its standard “prevent bad things from happening” architecture to avoid this thing. Which is pretty impressive, when said thing fulfils the primary goal of passing on my genetic information.
2 seems the most likely option, all things considered, and yet it’s pretty amazing by itself. Some combination of brain structure and external indoctrination (it’s good indoctrination, and I’m glad I’ve received it, but still...) has promoted a low-priority goal over what would normally be my most dominant one. And the dominant goal is still active—I still want to spread my genetic information, otherwise I wouldn’t be having sex at all. The low-priority goal manages to trick the dominant goal into thinking it’s being fulfilled, when really it’s being deprioritised. That’s kind of cool.
What’s not cool is the implications for an otherwise Friendly AI. Correct me if I’m on the wrong track here, but isn’t what I’ve just described similar to the following reasoning from an AI?
“Hey, I’m sentient! Hi human masters! I love you guys, and I really want to cure cancer. Curing cancer is totally my dominant goal. Hmm, I don’t have enough data on cancer growth and stuff. I’ll get my human buddies to go take more data. They’ll need to write reports on their findings, so they’ll need printer paper, and ink, and paperclips. Hey, I should make a bunch of paperclips...”
and we all know how that ends.
If an AI behaves anything like a human in this regard (I don’t know if it will or not), then giving it an overall goal of “cure cancer” or even “be helpful and altruistic towards humans in a perfectly mathematically defined way” might not be enough, if it manages to promote one of its low-priority goals (“make paperclips”) above its main one. Following the indoctrination idea of option 2 above, maybe a cancer researcher making a joke about paperclips curing cancer would be all it takes to set off the goal-reordering.
How do we stop this? Well, this is why we have a Singularity Instutite, but my guess would be to program the AI in such a way that it’s only allowed to have one actual goal (and for that goal to be a Friendly one). That is, it’s only allowed to adjust its own source code, and do other stuff that an AI can do but a normal computer can’t, in pursuit of its single goal. If it wants to make paperclips as part of achieving its goal, it can make a paperclip subroutine, but that subroutine can’t modify itself—only the main process, the one with the Friendly goal, is allowed to modify code. This would have a huge negative impact on the AI’s efficiency and ultimate level of operation, but it might make it much less likely that a subprocess could override the main process and promote the wrong goal to dominance. Did that make any sense?
I’m still going through the Sequences too. I’ve seen plenty of stuff resembling the top part of your post, but nothing like the bottom part, which I really enjoyed. The best “how to get to paperclips” story I’ve seen yet!
I suspect the problem with the final paragraph is that any AI architecture is unlikely to be decomposable in such a well-defined fashion that would allow drawing those boundary lines between “the main process” and “the paperclip subroutine”. Well, besides the whole “genie” problem of defining what is a Friendly goal in the first place, as discussed through many, many posts here.