We can’t agree on which political formations are more Friendly.
We also can’t agree on, say, the correct theory of quantum gravity. But reality is there and it works in some particular way, which we may or may not be able to discover.
The values of a friendly AI are usually assumed to be an idealization of universal human values. More precisely: when someone makes a decision, it is because their brain performs a particular computation. To the extent that this computation is the product of a specific cognitive architecture universal to our species (and not just the contingencies of their life), we could speak of “the human decision procedure”, an unknown universal algorithm of decision-making implicit in how our brains are organized.
This human decision procedure includes a method of generating preferences—preferring one possibility over another. So we can “ask” the human decision procedure “what would be the best decision procedure for humans to follow?” This produces an idealized decision procedure: a human ideal for how humans should be. That idealized decision procedure is what human ethics has been struggling towards, and that is where a friendly AI should get its values, and perhaps its methods, from.
It may seem that I am assuming rather a lot about how human decision-making cognition works, but what I just described is the simplest version of the idea. There may be multiple identifiable decision procedures in the human gene pool; the genetically determined part of the human decision procedure may be largely a template with values set by experience and culture; there may be multiple conflicting equilibria at the end of the idealization process, depending on how it starts.
For example, egoism and altruism may be different computational attractors, both a possible end result of reflective idealization of the human decision procedure; in which case a “politicization” of the value-setting process is certainly possible—a struggle over initial conditions. Or it may be that once you really know how humans think—as opposed to just guessing on the basis of folk psychology and very incomplete scientific knowledge—it’s apparent that this is a false opposition.
Either way, what I’m trying to convey here is a particular spirit of approach to the problem of values in friendly AI: that the answers should come from a scientific study of how humans actually think, that the true ideals and priorities of human beings are to be found by a study of the computational particulars of human thought, and that all our ideologies and moralities are just a flawed attempt by this computational process to ascertain its own nature.
If such an idealization exists, that would of course be preferable.
I suspect it doesn’t, which may color my position here, but I think it’s important to consider the alternatives if there isn’t a generalizable ideal; specifically, we should be working from the opposing end, and try to generalize from the specific instances; even if we can’t arrive at Strong Friendliness (the fully generalized ideal of human morality), we might still be able to arrive at Weak Friendliness (some generalized ideal that is at least acceptable to a majority of people).
Because the alternative for those of us who aren’t neurologists, as far as I can tell, is to wait.
We also can’t agree on, say, the correct theory of quantum gravity. But reality is there and it works in some particular way, which we may or may not be able to discover.
The values of a friendly AI are usually assumed to be an idealization of universal human values. More precisely: when someone makes a decision, it is because their brain performs a particular computation. To the extent that this computation is the product of a specific cognitive architecture universal to our species (and not just the contingencies of their life), we could speak of “the human decision procedure”, an unknown universal algorithm of decision-making implicit in how our brains are organized.
This human decision procedure includes a method of generating preferences—preferring one possibility over another. So we can “ask” the human decision procedure “what would be the best decision procedure for humans to follow?” This produces an idealized decision procedure: a human ideal for how humans should be. That idealized decision procedure is what human ethics has been struggling towards, and that is where a friendly AI should get its values, and perhaps its methods, from.
It may seem that I am assuming rather a lot about how human decision-making cognition works, but what I just described is the simplest version of the idea. There may be multiple identifiable decision procedures in the human gene pool; the genetically determined part of the human decision procedure may be largely a template with values set by experience and culture; there may be multiple conflicting equilibria at the end of the idealization process, depending on how it starts.
For example, egoism and altruism may be different computational attractors, both a possible end result of reflective idealization of the human decision procedure; in which case a “politicization” of the value-setting process is certainly possible—a struggle over initial conditions. Or it may be that once you really know how humans think—as opposed to just guessing on the basis of folk psychology and very incomplete scientific knowledge—it’s apparent that this is a false opposition.
Either way, what I’m trying to convey here is a particular spirit of approach to the problem of values in friendly AI: that the answers should come from a scientific study of how humans actually think, that the true ideals and priorities of human beings are to be found by a study of the computational particulars of human thought, and that all our ideologies and moralities are just a flawed attempt by this computational process to ascertain its own nature.
If such an idealization exists, that would of course be preferable.
I suspect it doesn’t, which may color my position here, but I think it’s important to consider the alternatives if there isn’t a generalizable ideal; specifically, we should be working from the opposing end, and try to generalize from the specific instances; even if we can’t arrive at Strong Friendliness (the fully generalized ideal of human morality), we might still be able to arrive at Weak Friendliness (some generalized ideal that is at least acceptable to a majority of people).
Because the alternative for those of us who aren’t neurologists, as far as I can tell, is to wait.