If you want to learn the fundamental concepts of a field, I find most of the time that textbooks with exercises are still the best option. The more introductory chapters of PhD theses are also helpful in this situation.
andreas
Thanks! Please keep on posting, this is interesting.
Since I never described a way of extracting preference from a human (and hence defining it for a FAI), I’m not sure where do you see the regress in the process of defining preference.
Reading your previous post in this thread, I felt like I was missing something and I could have asked the question Wei Dai asked (“Once we implement this kind of FAI, how will we be better off than we are today?”). You did not explicitly describe a way of extracting preference from a human, but phrases like “if you manage to represent your preference in terms of your I/O” made it seem like capturing strategy was what you had in mind.
I now understand you as talking only about what kind of object preference is (an I/O map) and about how this kind of object can contain certain preferences that we worry might be lost (like considerations of faulty hardware). You have not said anything about what kind of static analysis would take you from an agent’s
strategyprogram to an agent’s preference.
There is also Shades, which lets you set a tint color and which provides a slider so you can move gradually between standard and tinted mode.
My conclusion from this discussion is that our disagreement lies in the probability we assign that uploads can be applied safely to FAI as opposed to generating more existential risk. I do not see how to resolve this disagreement right now. I agree with your statement that we need to make sure that those involved in running uploads understand the problem of preserving human preference.
People have very feeble understanding of their own goals. Understanding is not required. Goals can’t be given “from the outside”, goals are what system does.
Even if we have little insight into our goals, it seems plausible that we frequently do things that are not conducive to our goals. If this is true, then in what sense can it be said that a system’s goals are what it does? Is the explanation that you distinguish between preference (goals the system would want to have) and goals that it actually optimizes for, and that you were talking about the latter?
Good, this is progress. Your comment clarified your position greatly. However, I do not know what you mean by “how long is WBE likely to take?” — take until what happens?
The first option tries to capture our best current guess as to our fundamental preference. It then updates the agent (us) based on that guess.
This guess may be awful. The process of emulation and attempts to increase the intelligence of the emulations may introduce subtle psychological changes that could affect the preferences of the persons involved.
For subsequent changes based on “trying to evolve towards what the agent thinks is its exact preference” I see two options: Either they are like the first change, open to the possibility of being arbitrarily awful due to the fact that we do not have much introspective insight into the nature of our preferences, and step by step we lose part of what we value — or subsequent changes consist of the formalization and precise capture of the object preference, in which case the situation must be judged depending on how much value was lost in the first step vs how much value was gained by having emulations work on the project of formalization.
For the second option though, it’s hard for me to imagine ever choosing to self-modify into an agent with exact, unchanging preferences.
This is not the proposal under discussion. The proposal is to build a tool that ensures that things develop according to our wishes. If it turns out that our preferred (in the exact, static sense) route of development is through a number of systems that are not reflectively consistent themselves, then this route will be realized.
It’s not clear to me that this is the only way to evaluate my claim, or that it is even a reasonable way. My understanding of FAI is that arriving at such a resolution of human preferences is a central ingredient to building an FAI, hence using your method to evaluate my claim would require more progress on FAI.
If your statement (“The route of WBE simply takes the guess work out”) were a comparison between two routes similar in approach, e.g. WBE and neuroenhancement, then you could argue that a better formal understanding of preference would be required before we could use the idea of “precise preference” to argue for one approach or the other.
Since we are comparing one option which does not try to capture preference precisely with an option that does, it does not matter what exactly precise preference says about the second option: Whatever statement our precise preferences make, the second option tries to capture it whereas the first option makes no such attempt.
Here is a potentially more productive way of seeing this situation: We do want our current preferences to be made reality (because that’s what the term preference describes), but we do not know what our preferences look like, part of the reason being that we are not smart enough and do not have enough time to think about what they are. In this view, our preferences are not necessarily going to drift if we figure out how to refer to human preference as a formal object and if we build machines that use this object to choose what to do — and in this view, we certainly don’t want our preferences to drift.
On the other hand, WBE does not “simply take the guess work out”. It may be the case that the human mind is built such that “making people smarter” is feasible without changing preference much, but we don’t know that this is the case. As long as we do not have a formal theory of preference, we cannot strongly believe this about any given intervention – and if we do have such a theory, then there exist better uses for this knowledge.
- Mar 13, 2010, 1:16 AM; 0 points) 's comment on Hedging our Bets: The Case for Pursuing Whole Brain Emulation to Safeguard Humanity’s Future by (
“Too sensitively” above makes some sense to me intuitively, but if someone asks “too sensitive compared to what?” then I can’t really give an answer.
Too sensitive compared to how you would want to feel if you knew more about your preferences (how low worlds rank where the offense was made) and more about what the world is like, e.g. the state of mind of those making the perceived offense?
Not for the Sake of Happiness (Alone) is a response to this suggestion.
Here is a revised way of asking the question I had in mind: If our preferences determine which extraction method is the correct one (the one that results in our actual preferences), and if we cannot know or use our preferences with precision until they are extracted, then how can we find the correct extraction method?
Asking it this way, I’m no longer sure it is a real problem. I can imagine that knowing what kind of object preference is would clarify what properties a correct extraction method needs to have.
There seems to be a bootstrapping problem: In order to figure out what the precise statement is that human preference makes, we need to know how to combine preferences from different systems; in order to know how preferences should combine, we need to know what human preference says about this.
What did you get out of this talk apart from the importance of the (Bayesian) Occam’s razor?
Same here. Can’t make this weekend, the following is fine.
This is the clearest statement of the problem FAI that I have read to date.
Then you’re arguing that, if your notion of “physically plausible environments” includes a certain class of adversely optimized situations, worst-case analysis won’t work because all worst cases are equally bad.
By “how well a system would do in physically plausible environments”, do you mean on average, worst case or something else?
Fodor’s arguments for a “language of thought” make sense (see his book of the same name). In a nutshell, thought seems to be productive – out of given concepts, we can always construct new ones, e.g. arbitrary nestings of “the mother of the mother of …” – systematic – knowing certain concepts automatically leads to the ability to construct other concepts, e.g. knowing the concept “child” and the concept “wild”, I can also represent “wild child” – and compositional, e.g. the meaning of “wild child” is a function of the meaning of “wild” and “child”.