I fail to see how my post was a straw man. I was pointing out a deficiency in what I am supporting, not what you are supporting.
This system isn’t stable, deviations in preference don’t correct themselves, if the deviated-preference agency has control.
I disagree that we know this. Certainly the system hasn’t stabilized yet, but how can you make such a broad statement about the future evolution of human preference? And, in any case, even if there were no ultimate attractor in the system, so what? Human preferences have changed over the centuries. My own preferences have changed over the years. I don’t think anyone is arguing this is a bad thing. Certainly, we may be able to build a system that replaces our “sloppy” method of advancement for a deterministic system with an immutable set of preferences at its core. I disagree this is necessarily superior to letting preferences evolve in the same way they have been, free of an overseer. But that disagreement of ours is still off topic.
The topic is whether FAI or WBE research is better for existential risk reduction. The pertinent question is what are the likelihoods of each leading to what we would consider a positive singularity, and, more importantly, how do those likelihoods change as a function of our directed effort?
I fail to see how my post was a straw man. I was pointing out a deficiency in what I am supporting, not what you are supporting.
It shouldn’t matter who supports what. If you suddenly agree with me on some topic, you still have to convince me that you did so for the right reasons, and didn’t accept a mistaken argument or mistaken understanding of an argument (see also “belief bias”). If such is to be discovered, you’d have to make a step back, and we both should agree that it’s the right thing to do.
The “strawman” (probably a wrong term in this context) is in making a distinction between “friendliness” and “provable friendliness”. If you accept that the distinction is illusory, the weakness of non-FAI “friendliness” suddenly becomes “provably fatal”.
This system isn’t stable, deviations in preference don’t correct themselves, if the deviated-preference agency has control.
I disagree that we know this. Certainly the system hasn’t stabilized yet, but how can you make such a broad statement about the future evolution of human preference?
Stability is a local property around a specific point, that states that sufficiently small deviations from that point will be followed by corrections back to it, so that the system will indefinitely remain in the close proximity of that point, provided it’s not disturbed too much.
Where we replace ourselves with agency of slightly different preference, this new agency has no reason to correct backwards to our preference. If it is not itself stable (that is, it hasn’t built its own FAI), then the next preference shift it’ll experience (in effectively replacing itself with yet different preference agency) isn’t going to be related to the first shift, isn’t going to correct it. As a result, value is slowly but inevitably lost. This loss of value only stops when the reflective consistency is finally achieved, but it won’t be by an agency that exactly shares your preference. Thus, even when you’ve lost a fight for specifically your preference, the only hope is for the similar-preference drifted agency to stop as soon as possible (as close to your preference as possible), to develop its FAI. (See also: Friendly AI: a vector for human preference.)
My own preferences have changed over the years. I don’t think anyone is arguing this is a bad thing.
The past-you is going to prefer your preference not to change, even though current-you would prefer your preference to be as it now is. Note that preference has little to do with likes or wants, so you might be talking about surface reactions to environment and knowledge, not the eluding concept of what you’d prefer in the limit of reflection. (See also: “Why Socialists don’t Believe in Fun”, Eutopia is Scary.)
The topic is whether FAI or WBE research is better for existential risk reduction. The pertinent question is what are the likelihoods of each leading to what we would consider a positive singularity, and, more importantly, how do those likelihoods change as a function of our directed effort?
And to decide this question, we need a solid understanding of what counts as a success or failure. The concept of preference is an essential tool in gaining this understanding.
I fail to see how my post was a straw man. I was pointing out a deficiency in what I am supporting, not what you are supporting.
I disagree that we know this. Certainly the system hasn’t stabilized yet, but how can you make such a broad statement about the future evolution of human preference? And, in any case, even if there were no ultimate attractor in the system, so what? Human preferences have changed over the centuries. My own preferences have changed over the years. I don’t think anyone is arguing this is a bad thing. Certainly, we may be able to build a system that replaces our “sloppy” method of advancement for a deterministic system with an immutable set of preferences at its core. I disagree this is necessarily superior to letting preferences evolve in the same way they have been, free of an overseer. But that disagreement of ours is still off topic.
The topic is whether FAI or WBE research is better for existential risk reduction. The pertinent question is what are the likelihoods of each leading to what we would consider a positive singularity, and, more importantly, how do those likelihoods change as a function of our directed effort?
It shouldn’t matter who supports what. If you suddenly agree with me on some topic, you still have to convince me that you did so for the right reasons, and didn’t accept a mistaken argument or mistaken understanding of an argument (see also “belief bias”). If such is to be discovered, you’d have to make a step back, and we both should agree that it’s the right thing to do.
The “strawman” (probably a wrong term in this context) is in making a distinction between “friendliness” and “provable friendliness”. If you accept that the distinction is illusory, the weakness of non-FAI “friendliness” suddenly becomes “provably fatal”.
Stability is a local property around a specific point, that states that sufficiently small deviations from that point will be followed by corrections back to it, so that the system will indefinitely remain in the close proximity of that point, provided it’s not disturbed too much.
Where we replace ourselves with agency of slightly different preference, this new agency has no reason to correct backwards to our preference. If it is not itself stable (that is, it hasn’t built its own FAI), then the next preference shift it’ll experience (in effectively replacing itself with yet different preference agency) isn’t going to be related to the first shift, isn’t going to correct it. As a result, value is slowly but inevitably lost. This loss of value only stops when the reflective consistency is finally achieved, but it won’t be by an agency that exactly shares your preference. Thus, even when you’ve lost a fight for specifically your preference, the only hope is for the similar-preference drifted agency to stop as soon as possible (as close to your preference as possible), to develop its FAI. (See also: Friendly AI: a vector for human preference.)
The past-you is going to prefer your preference not to change, even though current-you would prefer your preference to be as it now is. Note that preference has little to do with likes or wants, so you might be talking about surface reactions to environment and knowledge, not the eluding concept of what you’d prefer in the limit of reflection. (See also: “Why Socialists don’t Believe in Fun”, Eutopia is Scary.)
And to decide this question, we need a solid understanding of what counts as a success or failure. The concept of preference is an essential tool in gaining this understanding.