So enlighten me. Let’s hear some definitions. But without the insults.
You and your co-FAIers always talk about “human preferences”, as if that were a good thing. Yet you’re the same people who spend much of your time bemoaning how stupid humans are. Do you really believe that you have the same goals and the same ethics as all other humans, and the only thing that distinguishes you is intelligence? If so, then you can only be trying to preserve “values” such as “avoid pain”, “seek pleasure”, or “experience novelty”.
Humans have values made possible by their range of cognitive experiences. Yet the things we value most, like love and enjoyment, are evolutionarily recent discoveries. There are only 2 possible options: Either, in preserving human values, you wish to prevent, forever and all time, the development of any wider range of cognitive experiences and concomitant new values; or your notion of “human values” is so general as to encompass such new developments. If the latter, then you are seeking to preserve something even more general than “avoid pain” and “seek pleasure”, in which case you are really wasting everybody’s time.
There are only 2 possible options: Either, in preserving human values, you wish to prevent, forever and all time, the development of any wider range of cognitive experiences and concomitant new values; or your notion of “human values” is so general as to encompass such new developments.
It could encompass some such new developments but not others.
If it can encompass the development of new cognitive experiences, that means that the utility function is not expressed in terms of current cognitive experiences. So what is it expressed in terms of? And what is it preserving?
What Steven said. If course, preference is not about preserving something we don’t want preserved, such as satisfaction of human drives as they currently are. Specifying the ways in which human values could grow is not vacuous, as some ways in which values could develop are better than others.
Back to definitions, human preference is whatever you (being a human) happen to prefer, on reflection. If you are right and stopping moral growth is undesirable (I agree), then by definition stopping moral growth is not part of human preference. And so on. Human preference is the specification at the top of meta, that describes all possible considerations about the ways in which all other relevant developments should happen.
I don’t think there is a top to meta; and if there is, there’s nothing human about it.
You are still speaking as if there were one privileged, appropriate level of analysis for values. In fact, as with everything expressed by human language, there are different levels of abstraction, that are appropriate in different circumstances.
The question of how meta to go, depends on the costs, the benefits, the certainty of the analysis, and other factors.
The question of how to go meta cannot be made independently of the very sorts of values that the friendly AI and CEV are themselves supposed to arbitrate between. There is no way to get outside the system and do it objectively.
human preference is whatever you (being a human) happen to prefer, on reflection.
That is not a definition, no matter how many times it’s been repeated. It’s a tautology. That is side-stepping the issue. You need to either start being specific about values, or stop asking people to respect Friendly AI and coherent extrapolated volition as if they were coherent ideas. I’ve been waiting years for an explanation, and yet these things are still developed only to the level of precision of a dope-fueled dormitory rap session. Yet, somehow, instead of being dismissed, they are accumulating more and more adherents, and being treated with more and more respect.
EDIT: I exaggerate. EY has dealt with many aspects of FAI. But not, I think, with the most fundamental questions such as whether it makes any sense to talk about human values, whether preserving them is a good thing to do, how to trade off the present versus the future, and what “saving the human race” means.
Human preference is whatever you (being a human) happen to prefer, on reflection.
That is not a definition, no matter how many times it’s been repeated. It’s a tautology.
Sane definitions usually are. I don’t claim to know all about what sort of thing human preference is, but the term is defined roughly this way. This definition is itself fuzzy, because I can only refer to intuitions about “on reflection”, “prefer”, etc., but can’t define their combination in the concept of human preference mathematically. This definition contains an implicit problem statement, about formalization of the concept. But this formalization is the whole goal of preference theory, so one can’t expect it now. The term itself is useful, because it’s convenient to refer to the object of study.
FAI theory is an important topic not because it contains many interesting non-trivial results (it doesn’t), but because the problem needs to be solved. So far, even a good problem statement that won’t scare away mathematicians is lacking.
It’s an important topic, but I feel that may become an obstacle rather than a help towards the goal of avoiding AI catastrophe. It can be a flypaper that catches people interested in the problem, then leaves them stuck there while they wait for further clarifications from Eliezer that never come, instead of doing original work themselves, because they’ve been led to believe that FAI+CEV theory is more developed than it is.
I don’t think that was the intent, but it might be a welcome side-effect.
EY has little motivation to provide clarification, as long as people here continue to proclaim their faith in FAI+CEV. He’s said repeatedly that he doesn’t believe collaboration has value; he plans to solve the problem himself. Even supposing that he had a complete write-up on FAI+CEV in his hand today, actually publishing it could be a losing proposition in his eyes. It would encourage other people to do AI work and call it FAI (dangerous, I think he would say); it would make FAI no longer be the exclusive property of SIAI (a financial hazard); and it would reveal countless grounds for disagreement with his ideas and with his values.
Because I do believe in the value of collaboration, I would like to see more clarification. And I don’t think it’s forthcoming as long as people already give FAI+CEV the respect they would give a fully-formed theory.
Also, FAI+CEV is causing premature convergence within the transhumanist community. I know the standard FAI+CEV answers to a number of questions, and it dismays me to hear them spoken with more and more self-assurance by more and more smart people, when I know that these answers have weak spots that have been unexamined for far too long. It’s too soon for people to be agreeing this much on something that has been discussed so little.
Their mistake (I agree with your impression though). I’ve started working on FAI as soon as I understood the problem (as not having understanding of “fuzzy AGI” as a useful subgoal), about a year ago, and the current blog sequence is intended to help others in understanding the problem.
On the other hand, what do you see as the alternative to this “flypaper”, or an improvement thereof towards more productive modes? Building killer robots as a career is hardly a better road.
Gee, how can I answer this question in a way that doesn’t oblige me to do work?
One thing is, as a community, to motivate Eliezer to tell us more about his ideas on FAI on CEV, and to answer questions about them, by making it apparent that continuing to take these ideas seriously depends on continuing development of them. I very much appreciate his writing out his recent sequence on timeless decision theory, so I don’t want to harp on this at present. And of course Eliezer has no moral obligation to respond to you (unless you’ve given him time or money). But I’m not speaking of moral obligations; I’m speaking of strategy.
Another is to begin working on these ideas ourselves. This is hindered by us lacking a way to talk about, say, “Eliezer’s CEV” vs. CEV in general, and continuing to try to figure out what Eliezer’s opinion is (to get at the “true CEV theory”), instead of trying to figure out CEV theory independently. So a repeated pattern has been
person P (as in, for instance, “Phil”) asks a question about FAI or CEV
Eliezer doesn’t answer
person P gives their interpretation of FAI or CEV on the point, possibly in a “this is what I think Eliezer meant” way, or else in a “these are the implications of Eliezer’s ideas” way
Eliezer responds by saying that person P doesn’t know what they’re talking about, and should stop presuming to know what Eliezer thinks
I’ve assumed that FAI was about guaranteeing that humans would survive and thrive, not about taking over the universe and forestalling all other possibilities.
I’ve assumed that FAI was about guaranteeing that humans would survive and thrive, not about taking over the universe and forestalling all other possibilities.
Or does the former imply the latter?
It does, but your reaction to the latter is possibly incorrect. A singleton “forestalls” other possibilities no more than laws of a deterministic world, and you can easily have free will in a deterministic world. With a Friendly singleton, it is only better, because if you’d regret not having a particular possibility realized strongly enough, it will be realized, or something better will be realized in any case. Not so in an unsupervised universe.
With a Friendly singleton, it is only better, because if you’d regret not having a particular possibility realized strongly enough, it will be realized, or something better will be realized in any case.
Please stop speaking of friendly AI as if it were magic, and could be made to accomplish things simply by definition.
Another thing I’ve heard too much of, is people speaking as if FAI would be able to satisfy the goals of every individual. It has become routine on LW to say that FAI could satisfy “your” goals or desires. Peoples’ goals and desires are at odds with each other, sometimes inherently.
It’s the nature of hypotheticals to accomplish things they are defined as being able to accomplish. Friendly AI is such a creature that is able and willing to accomplish the associated feats. If it’s not going to do so, it’s not a Friendly AI. If it’s not possible to make it so, Friendly AI is impossible. Even if provably impossible, it still has the property of being able to do these things, as a hypothetical.
A hypothetical is typically something that you define as an aid to reason about something else. It is very tricky to set up FAI as a hypothetical construct, when the possibility of FAI is what you want to talk about.
Here’s my problem. I want the underlying problems in the notion of what an FAI is, to be resolved. Most of these problems are hidden by the definitions used. People need to think about how to implement the concept they’ve defined, in order to see the problems with the definition.
A hypothetical is typically something that you define as an aid to reason about something else. It is very tricky to set up FAI as a hypothetical construct, when the possibility of FAI is what you want to talk about.
It is a typical move in any problem about constructing a mathematical structure, for example in typical school compass and straightedge constructions problems. First, you assume that you’ve done what you needed to do, and figure out the properties it implies (requires); then, you actually construct the structure and prove that it has the required properties. It’s also a standard thing in decision theory, to assume that you’ve made a certain action, and then look what would happen if you do that, all in order to determine which action will be actually chosen (even though it’s impossible that the action that you won’t actually choose will happen).
Here’s my problem. I want the underlying problems in the notion of what an FAI is, to be resolved. Most of these problems are hidden by the definitions used. People need to think about how to implement the concept they’ve defined, in order to see the problems with the definition.
The most frequent problems with definitions are relevance, or emptiness (which feeds into relevance), in pathological cases tendency to mislead. (There are many possible problems.) You might propose a better (more relevant, that is more useful) definition, or prove that the defined concept is empty.
Yes—and the problem with friendliness is that it preserves human preference.
You do not understand what you are talking about. It’s not a problem by definition.
That’s ridiculous. If it’s not a problem by definition, it’s a useless concept.
So enlighten me. Let’s hear some definitions. But without the insults.
You and your co-FAIers always talk about “human preferences”, as if that were a good thing. Yet you’re the same people who spend much of your time bemoaning how stupid humans are. Do you really believe that you have the same goals and the same ethics as all other humans, and the only thing that distinguishes you is intelligence? If so, then you can only be trying to preserve “values” such as “avoid pain”, “seek pleasure”, or “experience novelty”.
Humans have values made possible by their range of cognitive experiences. Yet the things we value most, like love and enjoyment, are evolutionarily recent discoveries. There are only 2 possible options: Either, in preserving human values, you wish to prevent, forever and all time, the development of any wider range of cognitive experiences and concomitant new values; or your notion of “human values” is so general as to encompass such new developments. If the latter, then you are seeking to preserve something even more general than “avoid pain” and “seek pleasure”, in which case you are really wasting everybody’s time.
It could encompass some such new developments but not others.
If it can encompass the development of new cognitive experiences, that means that the utility function is not expressed in terms of current cognitive experiences. So what is it expressed in terms of? And what is it preserving?
What Steven said. If course, preference is not about preserving something we don’t want preserved, such as satisfaction of human drives as they currently are. Specifying the ways in which human values could grow is not vacuous, as some ways in which values could develop are better than others.
Back to definitions, human preference is whatever you (being a human) happen to prefer, on reflection. If you are right and stopping moral growth is undesirable (I agree), then by definition stopping moral growth is not part of human preference. And so on. Human preference is the specification at the top of meta, that describes all possible considerations about the ways in which all other relevant developments should happen.
I don’t think there is a top to meta; and if there is, there’s nothing human about it.
You are still speaking as if there were one privileged, appropriate level of analysis for values. In fact, as with everything expressed by human language, there are different levels of abstraction, that are appropriate in different circumstances.
The question of how meta to go, depends on the costs, the benefits, the certainty of the analysis, and other factors.
The question of how to go meta cannot be made independently of the very sorts of values that the friendly AI and CEV are themselves supposed to arbitrate between. There is no way to get outside the system and do it objectively.
That is not a definition, no matter how many times it’s been repeated. It’s a tautology. That is side-stepping the issue. You need to either start being specific about values, or stop asking people to respect Friendly AI and coherent extrapolated volition as if they were coherent ideas. I’ve been waiting years for an explanation, and yet these things are still developed only to the level of precision of a dope-fueled dormitory rap session. Yet, somehow, instead of being dismissed, they are accumulating more and more adherents, and being treated with more and more respect.
EDIT: I exaggerate. EY has dealt with many aspects of FAI. But not, I think, with the most fundamental questions such as whether it makes any sense to talk about human values, whether preserving them is a good thing to do, how to trade off the present versus the future, and what “saving the human race” means.
Sane definitions usually are. I don’t claim to know all about what sort of thing human preference is, but the term is defined roughly this way. This definition is itself fuzzy, because I can only refer to intuitions about “on reflection”, “prefer”, etc., but can’t define their combination in the concept of human preference mathematically. This definition contains an implicit problem statement, about formalization of the concept. But this formalization is the whole goal of preference theory, so one can’t expect it now. The term itself is useful, because it’s convenient to refer to the object of study.
FAI theory is an important topic not because it contains many interesting non-trivial results (it doesn’t), but because the problem needs to be solved. So far, even a good problem statement that won’t scare away mathematicians is lacking.
It’s an important topic, but I feel that may become an obstacle rather than a help towards the goal of avoiding AI catastrophe. It can be a flypaper that catches people interested in the problem, then leaves them stuck there while they wait for further clarifications from Eliezer that never come, instead of doing original work themselves, because they’ve been led to believe that FAI+CEV theory is more developed than it is.
I don’t think that was the intent, but it might be a welcome side-effect.
EY has little motivation to provide clarification, as long as people here continue to proclaim their faith in FAI+CEV. He’s said repeatedly that he doesn’t believe collaboration has value; he plans to solve the problem himself. Even supposing that he had a complete write-up on FAI+CEV in his hand today, actually publishing it could be a losing proposition in his eyes. It would encourage other people to do AI work and call it FAI (dangerous, I think he would say); it would make FAI no longer be the exclusive property of SIAI (a financial hazard); and it would reveal countless grounds for disagreement with his ideas and with his values.
Because I do believe in the value of collaboration, I would like to see more clarification. And I don’t think it’s forthcoming as long as people already give FAI+CEV the respect they would give a fully-formed theory.
Also, FAI+CEV is causing premature convergence within the transhumanist community. I know the standard FAI+CEV answers to a number of questions, and it dismays me to hear them spoken with more and more self-assurance by more and more smart people, when I know that these answers have weak spots that have been unexamined for far too long. It’s too soon for people to be agreeing this much on something that has been discussed so little.
Their mistake (I agree with your impression though). I’ve started working on FAI as soon as I understood the problem (as not having understanding of “fuzzy AGI” as a useful subgoal), about a year ago, and the current blog sequence is intended to help others in understanding the problem.
On the other hand, what do you see as the alternative to this “flypaper”, or an improvement thereof towards more productive modes? Building killer robots as a career is hardly a better road.
Gee, how can I answer this question in a way that doesn’t oblige me to do work?
One thing is, as a community, to motivate Eliezer to tell us more about his ideas on FAI on CEV, and to answer questions about them, by making it apparent that continuing to take these ideas seriously depends on continuing development of them. I very much appreciate his writing out his recent sequence on timeless decision theory, so I don’t want to harp on this at present. And of course Eliezer has no moral obligation to respond to you (unless you’ve given him time or money). But I’m not speaking of moral obligations; I’m speaking of strategy.
Another is to begin working on these ideas ourselves. This is hindered by us lacking a way to talk about, say, “Eliezer’s CEV” vs. CEV in general, and continuing to try to figure out what Eliezer’s opinion is (to get at the “true CEV theory”), instead of trying to figure out CEV theory independently. So a repeated pattern has been
person P (as in, for instance, “Phil”) asks a question about FAI or CEV
Eliezer doesn’t answer
person P gives their interpretation of FAI or CEV on the point, possibly in a “this is what I think Eliezer meant” way, or else in a “these are the implications of Eliezer’s ideas” way
Eliezer responds by saying that person P doesn’t know what they’re talking about, and should stop presuming to know what Eliezer thinks
end of discussion
I’ve assumed that FAI was about guaranteeing that humans would survive and thrive, not about taking over the universe and forestalling all other possibilities.
Or does the former imply the latter?
It does, but your reaction to the latter is possibly incorrect. A singleton “forestalls” other possibilities no more than laws of a deterministic world, and you can easily have free will in a deterministic world. With a Friendly singleton, it is only better, because if you’d regret not having a particular possibility realized strongly enough, it will be realized, or something better will be realized in any case. Not so in an unsupervised universe.
(See also: Preference is resilient and thorough, Friendly AI: a vector for human preference.)
Please stop speaking of friendly AI as if it were magic, and could be made to accomplish things simply by definition.
Another thing I’ve heard too much of, is people speaking as if FAI would be able to satisfy the goals of every individual. It has become routine on LW to say that FAI could satisfy “your” goals or desires. Peoples’ goals and desires are at odds with each other, sometimes inherently.
It’s the nature of hypotheticals to accomplish things they are defined as being able to accomplish. Friendly AI is such a creature that is able and willing to accomplish the associated feats. If it’s not going to do so, it’s not a Friendly AI. If it’s not possible to make it so, Friendly AI is impossible. Even if provably impossible, it still has the property of being able to do these things, as a hypothetical.
A hypothetical is typically something that you define as an aid to reason about something else. It is very tricky to set up FAI as a hypothetical construct, when the possibility of FAI is what you want to talk about.
Here’s my problem. I want the underlying problems in the notion of what an FAI is, to be resolved. Most of these problems are hidden by the definitions used. People need to think about how to implement the concept they’ve defined, in order to see the problems with the definition.
It is a typical move in any problem about constructing a mathematical structure, for example in typical school compass and straightedge constructions problems. First, you assume that you’ve done what you needed to do, and figure out the properties it implies (requires); then, you actually construct the structure and prove that it has the required properties. It’s also a standard thing in decision theory, to assume that you’ve made a certain action, and then look what would happen if you do that, all in order to determine which action will be actually chosen (even though it’s impossible that the action that you won’t actually choose will happen).
The most frequent problems with definitions are relevance, or emptiness (which feeds into relevance), in pathological cases tendency to mislead. (There are many possible problems.) You might propose a better (more relevant, that is more useful) definition, or prove that the defined concept is empty.