The first option tries to capture our best current guess as to our fundamental preference. It then updates the agent (us) based on that guess.
This guess may be awful. The process of emulation and attempts to increase the intelligence of the emulations may introduce subtle psychological changes that could affect the preferences of the persons involved.
For subsequent changes based on “trying to evolve towards what the agent thinks is its exact preference” I see two options: Either they are like the first change, open to the possibility of being arbitrarily awful due to the fact that we do not have much introspective insight into the nature of our preferences, and step by step we lose part of what we value — or subsequent changes consist of the formalization and precise capture of the object preference, in which case the situation must be judged depending on how much value was lost in the first step vs how much value was gained by having emulations work on the project of formalization.
For the second option though, it’s hard for me to imagine ever choosing to self-modify into an agent with exact, unchanging preferences.
This is not the proposal under discussion. The proposal is to build a tool that ensures that things develop according to our wishes. If it turns out that our preferred (in the exact, static sense) route of development is through a number of systems that are not reflectively consistent themselves, then this route will be realized.
It may be horribly awful, yes. The question is “how likely is it be awful?”
If FAI research can advance fast enough then we will have the luxury of implementing a coherent preference system that will guarantee the long term stability of our exact preferences. In an ideal world that would be the path we took. In the real world there is a downside to the FAI path: it may take too long. The benefit of other paths is that, although they would have some potential to fail even if executed in time, they offer a potentially faster time table.
I’ll reiterate: yes, of course FAI would be better than WBE, if both were available. No, WBE provides no guarantee and could lead to horrendous preference drift. The questions are: how likely is WBE to go wrong? how long is FAI likely to take? how long is WBE likely to take? And, ultimately, combining the answers to those questions together: where should we be directing our research?
Your post points out very well that WBE might go wrong. It gives no clue to the likelihood though.
Good, this is progress. Your comment clarified your position greatly. However, I do not know what you mean by “how long is WBE likely to take?” — take until what happens?
The amount of time until we have high fidelity emulations of human brains. At that point we can start modifying/enhancing humans, seeking to create a superintelligence or at least sufficiently intelligent humans that can then create an FAI. The time from first emulation to superintelligence is nonzero, but is probably small compared to the time to first emulation. If we have reason to believe that the additional time is not small we should factor in our predictions for it as well.
My conclusion from this discussion is that our disagreement lies in the probability we assign that uploads can be applied safely to FAI as opposed to generating more existential risk. I do not see how to resolve this disagreement right now. I agree with your statement that we need to make sure that those involved in running uploads understand the problem of preserving human preference.
This guess may be awful. The process of emulation and attempts to increase the intelligence of the emulations may introduce subtle psychological changes that could affect the preferences of the persons involved.
For subsequent changes based on “trying to evolve towards what the agent thinks is its exact preference” I see two options: Either they are like the first change, open to the possibility of being arbitrarily awful due to the fact that we do not have much introspective insight into the nature of our preferences, and step by step we lose part of what we value — or subsequent changes consist of the formalization and precise capture of the object preference, in which case the situation must be judged depending on how much value was lost in the first step vs how much value was gained by having emulations work on the project of formalization.
This is not the proposal under discussion. The proposal is to build a tool that ensures that things develop according to our wishes. If it turns out that our preferred (in the exact, static sense) route of development is through a number of systems that are not reflectively consistent themselves, then this route will be realized.
It may be horribly awful, yes. The question is “how likely is it be awful?”
If FAI research can advance fast enough then we will have the luxury of implementing a coherent preference system that will guarantee the long term stability of our exact preferences. In an ideal world that would be the path we took. In the real world there is a downside to the FAI path: it may take too long. The benefit of other paths is that, although they would have some potential to fail even if executed in time, they offer a potentially faster time table.
I’ll reiterate: yes, of course FAI would be better than WBE, if both were available. No, WBE provides no guarantee and could lead to horrendous preference drift. The questions are: how likely is WBE to go wrong? how long is FAI likely to take? how long is WBE likely to take? And, ultimately, combining the answers to those questions together: where should we be directing our research?
Your post points out very well that WBE might go wrong. It gives no clue to the likelihood though.
Good, this is progress. Your comment clarified your position greatly. However, I do not know what you mean by “how long is WBE likely to take?” — take until what happens?
The amount of time until we have high fidelity emulations of human brains. At that point we can start modifying/enhancing humans, seeking to create a superintelligence or at least sufficiently intelligent humans that can then create an FAI. The time from first emulation to superintelligence is nonzero, but is probably small compared to the time to first emulation. If we have reason to believe that the additional time is not small we should factor in our predictions for it as well.
My conclusion from this discussion is that our disagreement lies in the probability we assign that uploads can be applied safely to FAI as opposed to generating more existential risk. I do not see how to resolve this disagreement right now. I agree with your statement that we need to make sure that those involved in running uploads understand the problem of preserving human preference.
I’m not entirely sure how to resolve that either. However, it isn’t necessary for us to agree on that probability to agree on a course of action.
What probability would you assign to uploads being used safely? What do your probability distributions look like for the ETA of uploads, FAI and AGI?