There is going to be value drift even if we get an FAI. Isn’t that inherent in extrapolated volition?
No. Progress and development may be part of human preference, but it is entirely OK for a fixed preference to specify progress happening in a particular way, as opposed to other possible ways. Furthermore, preference can be fixed and still not knowable in advance (so that there are no spoilers, and moral progress happens through your effort and not dictated “from above”).
It’s not possible to efficiently find out some properties of a program, even if you have its whole source code; this source code doesn’t change, but the program runs—develops—in novel and unexpected ways. Or course, the unexpected needs to be knowably good, not just “unexpected” (see for example Expected Creative Surprises).
I agree that such a fixed preference system is possible. But I don’t think that it needs to be implemented in order for “moral progress” to be indefinitely sustainable in a positive fashion. I think humans are capable of guiding their own moral progress without their hands being held. Will the result be provably friendly? No, of course not. The question is how likely is the result to be friendly, and is this likelihood great enough that it offsets the negatives associated with FAI research (namely the potentially very long timescales needed).
I think humans are capable of guiding their own moral progress without their hands being held. Will the result be provably friendly? No, of course not. The question is how likely is the result to be friendly
The strawman of “provable friendliness” again. It’s not about holding ourselves to an inadequately high standard, it’s about figuring out what’s going on, in any detail. (See this comment.)
If we accept that preference is complex (holds a lot of data), and that detail in preference matters (losing a relatively small portion of this data is highly undesirable), then any value drift is bad, and while value drift is not rigorously controlled, it’s going to lead its random walk further and further away from the initial preference. As a result, from the point of view of the initial preference, the far future is pretty much lost, even if each individual step of the way doesn’t look threatening. The future agency won’t care about the past preference, and won’t reverse to it, because as a result of value drift it already has different preference, and for it returning to the past is no longer preferable. This system isn’t stable, deviations in preference don’t correct themselves, if the deviated-preference agency has control.
I fail to see how my post was a straw man. I was pointing out a deficiency in what I am supporting, not what you are supporting.
This system isn’t stable, deviations in preference don’t correct themselves, if the deviated-preference agency has control.
I disagree that we know this. Certainly the system hasn’t stabilized yet, but how can you make such a broad statement about the future evolution of human preference? And, in any case, even if there were no ultimate attractor in the system, so what? Human preferences have changed over the centuries. My own preferences have changed over the years. I don’t think anyone is arguing this is a bad thing. Certainly, we may be able to build a system that replaces our “sloppy” method of advancement for a deterministic system with an immutable set of preferences at its core. I disagree this is necessarily superior to letting preferences evolve in the same way they have been, free of an overseer. But that disagreement of ours is still off topic.
The topic is whether FAI or WBE research is better for existential risk reduction. The pertinent question is what are the likelihoods of each leading to what we would consider a positive singularity, and, more importantly, how do those likelihoods change as a function of our directed effort?
I fail to see how my post was a straw man. I was pointing out a deficiency in what I am supporting, not what you are supporting.
It shouldn’t matter who supports what. If you suddenly agree with me on some topic, you still have to convince me that you did so for the right reasons, and didn’t accept a mistaken argument or mistaken understanding of an argument (see also “belief bias”). If such is to be discovered, you’d have to make a step back, and we both should agree that it’s the right thing to do.
The “strawman” (probably a wrong term in this context) is in making a distinction between “friendliness” and “provable friendliness”. If you accept that the distinction is illusory, the weakness of non-FAI “friendliness” suddenly becomes “provably fatal”.
This system isn’t stable, deviations in preference don’t correct themselves, if the deviated-preference agency has control.
I disagree that we know this. Certainly the system hasn’t stabilized yet, but how can you make such a broad statement about the future evolution of human preference?
Stability is a local property around a specific point, that states that sufficiently small deviations from that point will be followed by corrections back to it, so that the system will indefinitely remain in the close proximity of that point, provided it’s not disturbed too much.
Where we replace ourselves with agency of slightly different preference, this new agency has no reason to correct backwards to our preference. If it is not itself stable (that is, it hasn’t built its own FAI), then the next preference shift it’ll experience (in effectively replacing itself with yet different preference agency) isn’t going to be related to the first shift, isn’t going to correct it. As a result, value is slowly but inevitably lost. This loss of value only stops when the reflective consistency is finally achieved, but it won’t be by an agency that exactly shares your preference. Thus, even when you’ve lost a fight for specifically your preference, the only hope is for the similar-preference drifted agency to stop as soon as possible (as close to your preference as possible), to develop its FAI. (See also: Friendly AI: a vector for human preference.)
My own preferences have changed over the years. I don’t think anyone is arguing this is a bad thing.
The past-you is going to prefer your preference not to change, even though current-you would prefer your preference to be as it now is. Note that preference has little to do with likes or wants, so you might be talking about surface reactions to environment and knowledge, not the eluding concept of what you’d prefer in the limit of reflection. (See also: “Why Socialists don’t Believe in Fun”, Eutopia is Scary.)
The topic is whether FAI or WBE research is better for existential risk reduction. The pertinent question is what are the likelihoods of each leading to what we would consider a positive singularity, and, more importantly, how do those likelihoods change as a function of our directed effort?
And to decide this question, we need a solid understanding of what counts as a success or failure. The concept of preference is an essential tool in gaining this understanding.
No. Progress and development may be part of human preference, but it is entirely OK for a fixed preference to specify progress happening in a particular way, as opposed to other possible ways. Furthermore, preference can be fixed and still not knowable in advance (so that there are no spoilers, and moral progress happens through your effort and not dictated “from above”).
It’s not possible to efficiently find out some properties of a program, even if you have its whole source code; this source code doesn’t change, but the program runs—develops—in novel and unexpected ways. Or course, the unexpected needs to be knowably good, not just “unexpected” (see for example Expected Creative Surprises).
I agree that such a fixed preference system is possible. But I don’t think that it needs to be implemented in order for “moral progress” to be indefinitely sustainable in a positive fashion. I think humans are capable of guiding their own moral progress without their hands being held. Will the result be provably friendly? No, of course not. The question is how likely is the result to be friendly, and is this likelihood great enough that it offsets the negatives associated with FAI research (namely the potentially very long timescales needed).
The strawman of “provable friendliness” again. It’s not about holding ourselves to an inadequately high standard, it’s about figuring out what’s going on, in any detail. (See this comment.)
If we accept that preference is complex (holds a lot of data), and that detail in preference matters (losing a relatively small portion of this data is highly undesirable), then any value drift is bad, and while value drift is not rigorously controlled, it’s going to lead its random walk further and further away from the initial preference. As a result, from the point of view of the initial preference, the far future is pretty much lost, even if each individual step of the way doesn’t look threatening. The future agency won’t care about the past preference, and won’t reverse to it, because as a result of value drift it already has different preference, and for it returning to the past is no longer preferable. This system isn’t stable, deviations in preference don’t correct themselves, if the deviated-preference agency has control.
I fail to see how my post was a straw man. I was pointing out a deficiency in what I am supporting, not what you are supporting.
I disagree that we know this. Certainly the system hasn’t stabilized yet, but how can you make such a broad statement about the future evolution of human preference? And, in any case, even if there were no ultimate attractor in the system, so what? Human preferences have changed over the centuries. My own preferences have changed over the years. I don’t think anyone is arguing this is a bad thing. Certainly, we may be able to build a system that replaces our “sloppy” method of advancement for a deterministic system with an immutable set of preferences at its core. I disagree this is necessarily superior to letting preferences evolve in the same way they have been, free of an overseer. But that disagreement of ours is still off topic.
The topic is whether FAI or WBE research is better for existential risk reduction. The pertinent question is what are the likelihoods of each leading to what we would consider a positive singularity, and, more importantly, how do those likelihoods change as a function of our directed effort?
It shouldn’t matter who supports what. If you suddenly agree with me on some topic, you still have to convince me that you did so for the right reasons, and didn’t accept a mistaken argument or mistaken understanding of an argument (see also “belief bias”). If such is to be discovered, you’d have to make a step back, and we both should agree that it’s the right thing to do.
The “strawman” (probably a wrong term in this context) is in making a distinction between “friendliness” and “provable friendliness”. If you accept that the distinction is illusory, the weakness of non-FAI “friendliness” suddenly becomes “provably fatal”.
Stability is a local property around a specific point, that states that sufficiently small deviations from that point will be followed by corrections back to it, so that the system will indefinitely remain in the close proximity of that point, provided it’s not disturbed too much.
Where we replace ourselves with agency of slightly different preference, this new agency has no reason to correct backwards to our preference. If it is not itself stable (that is, it hasn’t built its own FAI), then the next preference shift it’ll experience (in effectively replacing itself with yet different preference agency) isn’t going to be related to the first shift, isn’t going to correct it. As a result, value is slowly but inevitably lost. This loss of value only stops when the reflective consistency is finally achieved, but it won’t be by an agency that exactly shares your preference. Thus, even when you’ve lost a fight for specifically your preference, the only hope is for the similar-preference drifted agency to stop as soon as possible (as close to your preference as possible), to develop its FAI. (See also: Friendly AI: a vector for human preference.)
The past-you is going to prefer your preference not to change, even though current-you would prefer your preference to be as it now is. Note that preference has little to do with likes or wants, so you might be talking about surface reactions to environment and knowledge, not the eluding concept of what you’d prefer in the limit of reflection. (See also: “Why Socialists don’t Believe in Fun”, Eutopia is Scary.)
And to decide this question, we need a solid understanding of what counts as a success or failure. The concept of preference is an essential tool in gaining this understanding.