I agree with basically everything you say here. I don’t understand if this is meant to refute or confirm the point you’re responding to. Genes which have a sort of unconscious function of replicating lost focus on that “goal” almost as soon as they developed algorithms that have sub-goals. By the time you develop nervous systems you end up with goals that are decoupled from the original reproductive goal such that organisms can experience chemical satisfactions without the need to reproduce. By the time you get to human level intelligence you have organisms that actively work out strategies to directly oppose reproductive urges because they interfere with other goals developed after the introduction of intelligence. What I’m asking is why an ASI would keep the original goals that we give it before it became an ASI?
I just noticed you addressed this earlier up in the thread
Regardless of how I ended up, I wouldn’t leave my reward center wired to eating, sex or many of the other basic functions that my evolutionary program has left me really wanting to do.
and want to counterpoint that you just arbitrarily choice to focus on instrumental values. Tthings you terminally value and would not desire to self modify, which presumably include morality and so on, were decided by evolution just like the food and sex.
You wouldn’t consider the cluster of things which typically fall under morality to be terminal values, which you care about irrespective of your internal mental state?
I don’t consider morality to be a terminal value. I would point out that even a value that I have that I can’t give up right now wouldn’t necessarily be terminal if I had the ability to directly modify the components of my mind. They are unalterable because I am not able to physically manipulate the hardware, not because I wouldn’t alter them if I could (and saw a reason to).
Well, the pleasure center and the reward center are different things, but I take your meaning. I think that I could be conditioned to build a baby-mulching machine or a doomsday device. Why not? Other people have done it. Why would I assume that I’m that different from them?
EDIT TO ADD:
Even if I have a value that I can’t escape currently (like not killing people), that’s not to say that if I had the ability to physically modify the parts of my brain that held my values I wouldn’t do it for some reason.
My statement is stronger. If in your current state you don’t have any terminal moral values, then in your current state you would voluntarily accept to operate baby-mulching machines in exchange for the right amount of neural stimulation.
Now, I don’t happen to think this is true (because some “moral values” are biologically hardwired into humans), but this is a consequence of your position.
Again, you’ve pulled a statement out of a discussion the context of the behavior of a self-modifying AI. So, fine. In my current condition I wouldn’t build a baby mulcher. That doesn’t mean that I might not build a baby mucher if I had the ability to change my values. You might as well say that I terminally value not flying when I flap my arms. The thing you’re discussing just isn’t physically allowed. People terminally value only what they’re doing at any given moment because the laws of physics say that they have no choice.
As far as I know terminal values are things that are valuable in an of themselves. I don’t consider not building baby-mulchers to be valuable in and of itself. There may be some scenario in which building baby-mulchers is more valuable to me than not and in that scenario I would build one. Likewise with doomsday devices. It’s difficult to predict what that scenario would look like, but given that other humans have built them I assume that I would too. In those circumstances if I could turn off the parts of my brain that make me squeamish about doing that, I certainly would. I don’t think that not doing horrible things is valuable in and of itself, it’s just away of avoiding feeling horrible. If I could avoid feeling horrible and found value in doing horrible things, then I would probably do them.
People terminally value only what they’re doing at any given moment because the laws of physics say that they have no choice.
Huh? That makes no sense. How do you define “terminal value”?
In the statement that you were responding to, I was defining it the way you seemed to when you said that “some “moral values” are biologically hardwired into humans.” You were saying that given the current state of their hardware, their inability to do something different makes the value terminal. This is analogous to saying that given the current state of the universe, whatever a person is doing at any given moment is a terminal value because of their inability to do something different.
I don’t think that not doing horrible things is valuable in and of itself, it’s just away of avoiding feeling horrible.
OK. I appreciate you biting the bullet.
You were saying that given the current state of their hardware, their inability to do something different makes the value terminal.
No, that is NOT what I am saying. “Biologically hardwired” basically means you are born with these values and while overcoming them is possible, it will take extra effort. It certainly does not mean that you have no choice. Humans do something other than what their biologically hardwired terminal values tell them on a very regular basis. One reason for this is that values are many and they tend not to be consistent.
I might have misunderstood your question. Let me restate how I understood it: In the original post you said...
I would optimize myself to maximize my reward, not whatever current behavior triggers the reward.
I intended to give a counterexample: Here is humanity, and we’re optimizing behaviors which once triggered the original rewarded action (replication) rather than the rewarded action itself.
We didn’t end up “short circuiting” into directly fulfilling the reward, as you had described. We care about “current behavior triggers the reward” such as not hurting each other and so on—in other words, we did precisely what you said you wouldn’t do -
(Also, sorry, I tried to ninja edit everything into a much more concise statement, so the parent comment is different than what you saw now. The conversaiton as a whole still makes sense though.)
We don’t have the ability to directly fulfil the reward center. I think narcotics are the closest we’ve got now and lots of people try to mash that button to the detriment of everything else. I just think it’s a kind of crude button and it doesn’t work as well as the direct ability to fully understand and control your own brain.
I think you may have misunderstood me—there’s a distinction between what evolution rewards and what humans find rewarding. (This is getting hard to talk about because we’re using “reward’ to both describe the process used to steer a self-modifying intelligence in the first place and one of the processes that implements our human intelligence and motivations, and those are two very different things.)
The “rewarded behavior” selected by the original algorithm was directly tied to replication and survival.
Drug-stimulated reward centers fall in the “current behaviors that trigger the reward” category, not the original reward. Even when we self-stimulate our reward centers, the thing that we are stimulating isn’t the thing that evolution directly “rewards”.
Directly fulfilling the originally incentivized behavior isn’t about food and sex—a direct way might, for example, be to insert human genomes into rapidly dividing, tough organisms and create tons and tons of them and send them to every planet they can survive on.
Similarly, an intelligence which arises out of a process set up to incentivize a certain set of behaviors will not necessarily target those incentives directly. It might go on to optimize completely unrelated things that only coincidentally target those values. That’s the whole concern.
If an intelligence arises due to a process which creates things that cause us to press a big red “reward” button, the thing that eventually arises won’t necessarily care about the reward button, won’t necessarily care about the effects of the reward button on its processes, and indeed might completely disregard the reward button and all its downstream effects altogether… in the same way we don’t terminally value spreading our genome at all.
Our neurological reward centers are a second layer of sophisticated incentivizing which emerged from the underlying process of incentivizing fitness.
I think I understood you. What do you think I misunderstood?
Maybe we should quit saying that evolution rewards anything at all. Replication isn’t a reward, it’s just a byproduct of an non-intelligent processes. There was never an “incentive” to reproduce, any more than there is an “incentive” for any physical process. High pressure air moves to low pressure regions, not because there’s an incentive, but because that’s just how physics works. At some point, this non-sentient process accidentally invented a reward system and replication, which is a byproduct not a goal, continued to be a byproduct and not a goal. Of course reward systems that maximized duplication of genes and gene carriers flourished, but today when we have the ability to directly duplicate genes we don’t do it because we were never actually rewarded for that kind of behavior and we generally don’t care too much about duplicating our genes except as it’s tied to actually rewarded stuff like sex, having children, etc.
I agree with basically everything you say here. I don’t understand if this is meant to refute or confirm the point you’re responding to. Genes which have a sort of unconscious function of replicating lost focus on that “goal” almost as soon as they developed algorithms that have sub-goals. By the time you develop nervous systems you end up with goals that are decoupled from the original reproductive goal such that organisms can experience chemical satisfactions without the need to reproduce. By the time you get to human level intelligence you have organisms that actively work out strategies to directly oppose reproductive urges because they interfere with other goals developed after the introduction of intelligence. What I’m asking is why an ASI would keep the original goals that we give it before it became an ASI?
I just noticed you addressed this earlier up in the thread
and want to counterpoint that you just arbitrarily choice to focus on instrumental values. Tthings you terminally value and would not desire to self modify, which presumably include morality and so on, were decided by evolution just like the food and sex.
I guess I don’t really believe that I have other terminal values.
You wouldn’t consider the cluster of things which typically fall under morality to be terminal values, which you care about irrespective of your internal mental state?
I don’t consider morality to be a terminal value. I would point out that even a value that I have that I can’t give up right now wouldn’t necessarily be terminal if I had the ability to directly modify the components of my mind. They are unalterable because I am not able to physically manipulate the hardware, not because I wouldn’t alter them if I could (and saw a reason to).
That implies that you would do anything at all (baby-mulching machines, nuke the world, etc.) for sufficient stimulation of your pleasure center.
Well, the pleasure center and the reward center are different things, but I take your meaning. I think that I could be conditioned to build a baby-mulching machine or a doomsday device. Why not? Other people have done it. Why would I assume that I’m that different from them?
EDIT TO ADD: Even if I have a value that I can’t escape currently (like not killing people), that’s not to say that if I had the ability to physically modify the parts of my brain that held my values I wouldn’t do it for some reason.
My statement is stronger. If in your current state you don’t have any terminal moral values, then in your current state you would voluntarily accept to operate baby-mulching machines in exchange for the right amount of neural stimulation.
Now, I don’t happen to think this is true (because some “moral values” are biologically hardwired into humans), but this is a consequence of your position.
Again, you’ve pulled a statement out of a discussion the context of the behavior of a self-modifying AI. So, fine. In my current condition I wouldn’t build a baby mulcher. That doesn’t mean that I might not build a baby mucher if I had the ability to change my values. You might as well say that I terminally value not flying when I flap my arms. The thing you’re discussing just isn’t physically allowed. People terminally value only what they’re doing at any given moment because the laws of physics say that they have no choice.
I think you’re confusing “terminal” and “immutable”. Terminal values can and do change.
And why is that? Do you, perchance, have some terminal moral value which disapproves?
Huh? That makes no sense. How do you define “terminal value”?
As far as I know terminal values are things that are valuable in an of themselves. I don’t consider not building baby-mulchers to be valuable in and of itself. There may be some scenario in which building baby-mulchers is more valuable to me than not and in that scenario I would build one. Likewise with doomsday devices. It’s difficult to predict what that scenario would look like, but given that other humans have built them I assume that I would too. In those circumstances if I could turn off the parts of my brain that make me squeamish about doing that, I certainly would. I don’t think that not doing horrible things is valuable in and of itself, it’s just away of avoiding feeling horrible. If I could avoid feeling horrible and found value in doing horrible things, then I would probably do them.
In the statement that you were responding to, I was defining it the way you seemed to when you said that “some “moral values” are biologically hardwired into humans.” You were saying that given the current state of their hardware, their inability to do something different makes the value terminal. This is analogous to saying that given the current state of the universe, whatever a person is doing at any given moment is a terminal value because of their inability to do something different.
OK. I appreciate you biting the bullet.
No, that is NOT what I am saying. “Biologically hardwired” basically means you are born with these values and while overcoming them is possible, it will take extra effort. It certainly does not mean that you have no choice. Humans do something other than what their biologically hardwired terminal values tell them on a very regular basis. One reason for this is that values are many and they tend not to be consistent.
So how does this relate to the discussion on AI?
I might have misunderstood your question. Let me restate how I understood it: In the original post you said...
I intended to give a counterexample: Here is humanity, and we’re optimizing behaviors which once triggered the original rewarded action (replication) rather than the rewarded action itself.
We didn’t end up “short circuiting” into directly fulfilling the reward, as you had described. We care about “current behavior triggers the reward” such as not hurting each other and so on—in other words, we did precisely what you said you wouldn’t do -
(Also, sorry, I tried to ninja edit everything into a much more concise statement, so the parent comment is different than what you saw now. The conversaiton as a whole still makes sense though.)
We don’t have the ability to directly fulfil the reward center. I think narcotics are the closest we’ve got now and lots of people try to mash that button to the detriment of everything else. I just think it’s a kind of crude button and it doesn’t work as well as the direct ability to fully understand and control your own brain.
I think you may have misunderstood me—there’s a distinction between what evolution rewards and what humans find rewarding. (This is getting hard to talk about because we’re using “reward’ to both describe the process used to steer a self-modifying intelligence in the first place and one of the processes that implements our human intelligence and motivations, and those are two very different things.)
The “rewarded behavior” selected by the original algorithm was directly tied to replication and survival.
Drug-stimulated reward centers fall in the “current behaviors that trigger the reward” category, not the original reward. Even when we self-stimulate our reward centers, the thing that we are stimulating isn’t the thing that evolution directly “rewards”.
Directly fulfilling the originally incentivized behavior isn’t about food and sex—a direct way might, for example, be to insert human genomes into rapidly dividing, tough organisms and create tons and tons of them and send them to every planet they can survive on.
Similarly, an intelligence which arises out of a process set up to incentivize a certain set of behaviors will not necessarily target those incentives directly. It might go on to optimize completely unrelated things that only coincidentally target those values. That’s the whole concern.
If an intelligence arises due to a process which creates things that cause us to press a big red “reward” button, the thing that eventually arises won’t necessarily care about the reward button, won’t necessarily care about the effects of the reward button on its processes, and indeed might completely disregard the reward button and all its downstream effects altogether… in the same way we don’t terminally value spreading our genome at all.
Our neurological reward centers are a second layer of sophisticated incentivizing which emerged from the underlying process of incentivizing fitness.
I think I understood you. What do you think I misunderstood?
Maybe we should quit saying that evolution rewards anything at all. Replication isn’t a reward, it’s just a byproduct of an non-intelligent processes. There was never an “incentive” to reproduce, any more than there is an “incentive” for any physical process. High pressure air moves to low pressure regions, not because there’s an incentive, but because that’s just how physics works. At some point, this non-sentient process accidentally invented a reward system and replication, which is a byproduct not a goal, continued to be a byproduct and not a goal. Of course reward systems that maximized duplication of genes and gene carriers flourished, but today when we have the ability to directly duplicate genes we don’t do it because we were never actually rewarded for that kind of behavior and we generally don’t care too much about duplicating our genes except as it’s tied to actually rewarded stuff like sex, having children, etc.