I think that there’s a misunderstanding about CEV going on.
At some point, we have to admit that human intuitions are genuinely in conflict in an irreconcilable way.
I don’t think an AI would just ask us what we want, and then do what suits most of us. It would consider how our brains work, and exactly what shards of value make us up. Intuition isn’t a very good guide to what is the best decision for us—the point of CEV is that if we knew more about the world and ethics, we would do different things, and think different thoughts about ethics.
You might object that a person might fundamentally value something that clashes with my values. But I think this is not likely to be found on Earth. I don’t know what CEV would do with a human and a paperclips maximiser, but with just humans?
But not similar enough, I’d argue. For example, I value not farming nonhuman animals and making sure significant resources address world poverty (for a few examples). Not that many other people do. Hopefully CEV will iron that out so this minority wins over the majority, but I don’t quite know how.
(Comment disclaimer: Yes, I am woefully unfamiliar with CEV literature and unqualified to critique it. But hey, this is a comment in discussion. I do plan to research CEV more before I actually decide to disagree with it, assuming I do disagree with it after researching it further.)
Either, if we all knew more, thought faster, understood ourselves better, we would decide to farm animals, or we wouldn’t. For people to be so fundamentally different that there would be disagreement, they would need massively complex adaptations / mutations, which are vastly improbable. Even if someone sits down, and thinks long and hard about an ethical dilemma, they can very easily be wrong. To say that an AI could not coherently extrapolate our volition, is to say we’re so fundamentally unlike that we would not choose to work for a common good if we had the choice.
But why run this risk? The genuine moral motivation of typical humans seems to be weak. That might even be true of the people working for human and non-human altruistic causes and movements. What if what they really want, deep down, is a sense of importance or social interaction or whatnot?
So why not just go for utilitarianism? By definition, that’s the safest option for everyone to whom things can matter/be valuable.
I still don’t see what could justify coherently extrapolating “our” volition only. The only non-arbitrary “we” is the community of all minds/consciousnesses.
What if what they really want, deep down, is a sense of importance or social interaction or whatnot?
This sounds a bit like religious people saying “But what if it turns out that there is no morality? That would be bad!”. What part of you thinks that this is bad? Because, that is what CEV is extrapolating. CEV is taking the deepest and most important values we have, and figuring out what to do next. You in principle couldn’t care about anything else.
If human values wanted to self-modify, then CEV would recognise this. CEV wants to do what we want most, and this we call ‘right’.
The only non-arbitrary “we” is the community of all minds/consciousnesses.
This is what you value, what you chose. Don’t lose sight of invisible frameworks. If we’re including all decision procedures, then why not computers too? This is part of the human intuition of ‘fairness’ and ‘equality’ too. Not the hamster’s one.
The point you quoted is my main objection to CEV as well.
You might object that a person might fundamentally value something that clashes with my values. But I think this is not likely to be found on Earth.
Right now there are large groups who have specific goals that fundamentally clash with some goals of those in other groups. The idea of “knowing more about [...] ethics” either presumes an objective ethics or merely points at you or where you wish you were.
The existence of moral disagreement is not an argument against CEV, unless all disagreeing parties know everything there is to know about their desires, and are perfect bayesians. Otherwise, people can be mistaken about what they really want, or what the facts prescribe (given their values).
‘Objective ethics’? ‘Merely points… at where you wish you were’? “Merely”!?
Take your most innate desires. Not ‘I like chocolate’ or ‘I ought to condemn murder’, but the most basic levels (go to a neuroscientist to figure those out). Then take the facts of the world. If you had a sufficiently powerful computer, and you could input the values and plug in the facts, then the output would be what you wanted to do best.
That doesn’t mean whichever urge is strongest, but it takes into account the desires that make up your conscience, and the bit of you saying ‘but that’s not what’s right’. If you could perform this calculation in your head, you’d get the feeling of ‘Yes, that’s what is right. What else could it possibly be? What else could possibly matter?’ This isn’t ‘merely’ where you wish you were. This is the ‘right’ place to be.
This reply is more about the meta-ethics, but for interpersonal ethics, please see my response to peter_hurford’s comment above.
Otherwise, people can be mistaken about what they really want, or what the facts prescribe (given their values).
The fact that people can be mistaken about what they really want is vanishingly small evidence that if they were not mistaken, they would find out they all want the same things.
A very common desire is to be more prosperous than one’s peers. It’s not clear to me that there is some “real” goal that this serves (for an individual) -- it could be literally a primary goal. If that’s the case, then we already have a problem: two people in a peer group cannot both get all they want if both want to have more than any other. I can’t think of any satisfactory solution to this. Now, one might say, “well, if they’d grown up farther together this would be solvable”, but I don’t see any reason that should be true. People don’t necessarily grow more altruistic as they “grow up”, so it seems that there might well be no CEV to arrive at. I think, actually, a weaker version of the UFAI problem exists here: sure, humans are more similar to each other than UFAI’s need be to each other, but they still seem fundamentally different in goal systems and ethical views, in many respects.
Human beings are physically/genetically/mentally similar within certain tolerances; this implies there is one system of ethics (within certain tolerances) that is best suited all of us, which could be objectively determined by a thorough and competent enough analysis of humans. The edges of the bell curve on various factors might have certain variances. There might be a multi-modal distribution of fit (bimodal on men and women, for example), too. But, basically, one objective ethics for humans.
This ethics would clearly be unsuited for cats, sharks, bees, or trees. It seems vanishingly unlikely that sapient minds from other evolutions would also be suited for such an ethics, either. So it’s not universal, it’s not a code God wrote into everything. It’s just the best way to be a human . . . as humans exposed to it would in fact judge, because it’s fitted to us better than any of our current fumbling attempts.
What would that mean? How would the chicken learn or follow the ethics? Does it seem even remotely reasonable that social behavior among chickens and social behavior among humans should follow the same rules, given the inherent evolutionary differences in social structure and brain reward pathways?
It might be that CEV is impossible for humans, but there’s at least enough basic commonality to give it a chance of being possible.
Why would the chicken have to learn to follow the ethics in order for its interests to be fully included in the ethics? We don’t include cognitively normal human adults because they are able to understand and follow ethical rules (or, at the very least, we don’t include them only in virtue of that fact). We include them because to them as sentient beings, their subjective well-being matters. And thus we also include the many humans who are unable to understand and follow ethical rules. We ourselves, of course, would want to be still included in case we lost the ability to follow ethical rules. In other words: Moral agency is not necessary for the status of a moral patient, i.e. of a being that matters morally.
The question is how we should treat humans and chickens (i.e. whether and how our decision-making algorithm should take them and their interests into account), not what social behavior we find among humans and chickens.
Constructing an ethics that demands that a chicken act as a moral agent is obviously nonsense; chickens can’t and won’t act that way. Similarly, constructing an ethics that demands humans value chickens as much as they value their own children is nonsense; humans can’t and won’t act that way. If you’re constructing an ethics for humans follow, you have to start by figuring out humans.
It’s not until after you’ve figured out how much humans should value the interests of chickens that you can determine how much to weigh the interests of chickens in how humans should act. And how much humans should weigh the value of chickens is by necessity determined by what humans are.
Well, if humans can’t and won’t act that way, too bad for them! We should not model ethics after the inclinations of a particular type of agent, but we should instead try and modify all agents according to ethics.
If we did model ethics after particular types of agent, here’s what would result: Suppose it turns out that type A agents are sadistic racists. So what they should do is put sadistic racism into practice. Type B agents, on the other hand, are compassionate anti-racists. So what they should do is diametrically opposed to what type A agents should do. And we can’t morally compare types A and B.
But type B is obviously objectively better, and objectively less of a jerk. (Whether type A agents can be rationally motivated (or modified so as) to become more B-like is a different question.)
Of course we can morally compare types A and B, just as we can morally compare an AI whose goal is to turn the world into paperclips and one whose goal is to make people happy.
However, rather than “objectively better”, we could be more clear by saying “more in line with our morals” or some such. It’s not as if our morals came from nowhere, after all.
Just to make clear, are you saying that we should treat chickens how humans want to treat them, or how chickens do? Because if the former, then yeah, CEV can easily find out whether we’d want them to have good lives or not (and I think it would see we do).
But chickens don’t (I think) have much of an ethical system, and if we incorporated their values into what CEV calculates, then we’d be left with some important human values, but also a lot of chicken feed.
Thanks, Benito. Do we know that we shouldn’t have a lot of chicken feed? My point in asking this is just that we’re baking in a lot of the answer by choosing which minds we extrapolate in the first place. Now, I have no problem baking in answers—I want to bake in my answers—but I’m just highlighting that it’s not obvious that the set of human minds is the right one to extrapolate.
BTW, I think the “brain reward pathways” between humans and chickens aren’t that different. Maybe you were thinking about the particular, concrete stimuli that are found to be rewarding rather than the general architecture.
Human beings are physically/genetically/mentally similar within certain tolerances; this implies there is one system of ethics (within certain tolerances) that is best suited all of us
It does not imply that there exists even one basic moral/ethical statement any human being would agree with, and to me that seems to be a requirement for any kind of humanity-wide system of ethics. Your ‘one size fits all’ approach does not convince me, and your reasoning seems superficial and based on words rather than actual logic.
All humans as they currently exist, no. But is there a system of ethics as a whole that humans, even currently disagreeing with some parts of it, would recognize as superior at doing what they really want from an ethical system that they would switch to it? Even in the main? Maybe, indeed, human ethics are so dependent on alleles that vary within the population and chance environmental factors that CEV is impossible. But there’s no solid evidence to require assuming that a priori, either.
By analogy, consider the person who in 1900 wanted to put together the ideal human diet. Obviously, the diets in different parts of the world differed from each other extensively, and merely averaging all of them that existed in 1900 would not be particularly conducive to finding an actual ideal diet. The person would have to do all the sorts of research that discovered the roles of various nutrients and micronutrients, et cetera. Indeed, he’d have to learn more than we currently do about them. And he’d have to work out the variations to react to various medical conditions, and he’d have to consider flavor (both innate response pathways and learned ones), et cetera. And then there’s the limit of what foods can be grown where, what shipping technologies exist, how to approximate the ideal diet in differing circumstances.
It would be difficult, but eventually you probably could put together a dietary program (including understood variations) that would, indeed, suit humans better than any of the existing diets in 1900, both in nutrition and pleasure. It wouldn’t suit sharks at all; it would not be a universal nutrition. But it would be an objectively determined diet just the same.
The problem with this diet is that it wouldn’t be a diet; it would be many different diets. Lots of people are lactose intolerant and it would be stupid to remove dairy products from the diet of those who are not. Likewise, a vegetarian diet is not a “variation” of a non-vegetarian diet.
Also, why are you talking about 1900?
Maybe, indeed, human ethics are so dependent on alleles that vary within the population and chance environmental factors that CEV is impossible. But there’s no solid evidence to require assuming that a priori, either.
I think the fact that humans can’t agree on even the most basic issues is pretty solid evidence. Also, even if everyone had the same subjective ethics, this still would result in objective contradictions. I’m not aware of any evidence that this problem is solvable at all.
The existence of moral disagreement is not an argument against CEV, unless all disagreeing parties know everything there is to know about their desires, and are perfect bayesians. People can be mistaken about what they really want, or what the facts prescribe (given their values).
I linked to this above, but I don’t know if you’ve read it. Essentially, you’re explaining moral disagreement by positing massively improbable mutations, but it’s far more likely to be a combination of bad introspection and non-bayesian updating.
Essentially, you’re explaining moral disagreement by positing massively improbable mutations [...]
Um, different organisms of the same species typically have conflicting interests due to standard genetic diversity—not “massively improbable mutations”.
Typically, organism A acts as though it wants to populate the world with its offspring, and organism B acts as though it wants to populate the world with its offspring, and these goals often conflict—because A and B have non-identical genomes. Clearly, no “massively improbable mutations” are required in this explanation. This is pretty-much biology 101.
Typically, organism A acts as though it wants to populate the world with its offspring, and organism B acts as though it wants to populate the world with its offspring, and these goals often conflict—because A and B have non-identical genomes.
It’s very hard for A and B to know how much their genomes differ, because they can only observe each other’s phenotypes, and they can’t invest too much time in that either. So they will mostly compete even if their genomes happen to be identical.
The kin recognition that you mention may be tricky, but kin selection is much more widespread—because there are heuristics that allow organisms to favour their kin without the need to examine them closely—like: “be nice to your nestmates”.
Simple limited dispersal often results in organisms being surrounded by their close kin—and this is a pretty common state of affairs for plants and fungi.
Well, for humans, we’ve evolved desires that work interpersonally (fairness, desires for others’ happiness etc,). I think that an AI, which had our values written in, would have no problem figuring out what’s best for us. It would say ‘well, there’s is complex set of values, that sum up to everyone being treated well (or something), and so each party involved should be treated well.’
You’re right though, I hadn’t made clear idea about how this bit worked. Maybe this helps?
I think that there’s a misunderstanding about CEV going on.
I don’t think an AI would just ask us what we want, and then do what suits most of us. It would consider how our brains work, and exactly what shards of value make us up. Intuition isn’t a very good guide to what is the best decision for us—the point of CEV is that if we knew more about the world and ethics, we would do different things, and think different thoughts about ethics.
You might object that a person might fundamentally value something that clashes with my values. But I think this is not likely to be found on Earth. I don’t know what CEV would do with a human and a paperclips maximiser, but with just humans?
We’re pretty similar.
But not similar enough, I’d argue. For example, I value not farming nonhuman animals and making sure significant resources address world poverty (for a few examples). Not that many other people do. Hopefully CEV will iron that out so this minority wins over the majority, but I don’t quite know how.
(Comment disclaimer: Yes, I am woefully unfamiliar with CEV literature and unqualified to critique it. But hey, this is a comment in discussion. I do plan to research CEV more before I actually decide to disagree with it, assuming I do disagree with it after researching it further.)
Okay.
Either, if we all knew more, thought faster, understood ourselves better, we would decide to farm animals, or we wouldn’t. For people to be so fundamentally different that there would be disagreement, they would need massively complex adaptations / mutations, which are vastly improbable. Even if someone sits down, and thinks long and hard about an ethical dilemma, they can very easily be wrong. To say that an AI could not coherently extrapolate our volition, is to say we’re so fundamentally unlike that we would not choose to work for a common good if we had the choice.
But why run this risk? The genuine moral motivation of typical humans seems to be weak. That might even be true of the people working for human and non-human altruistic causes and movements. What if what they really want, deep down, is a sense of importance or social interaction or whatnot?
So why not just go for utilitarianism? By definition, that’s the safest option for everyone to whom things can matter/be valuable.
I still don’t see what could justify coherently extrapolating “our” volition only. The only non-arbitrary “we” is the community of all minds/consciousnesses.
This sounds a bit like religious people saying “But what if it turns out that there is no morality? That would be bad!”. What part of you thinks that this is bad? Because, that is what CEV is extrapolating. CEV is taking the deepest and most important values we have, and figuring out what to do next. You in principle couldn’t care about anything else.
If human values wanted to self-modify, then CEV would recognise this. CEV wants to do what we want most, and this we call ‘right’.
This is what you value, what you chose. Don’t lose sight of invisible frameworks. If we’re including all decision procedures, then why not computers too? This is part of the human intuition of ‘fairness’ and ‘equality’ too. Not the hamster’s one.
Yes. We want utilitarianism. You want CEV. It’s not clear where to go from there.
FWIW, hamsters probably exhibit fairness sensibility too. At least rats do.
The point you quoted is my main objection to CEV as well.
Right now there are large groups who have specific goals that fundamentally clash with some goals of those in other groups. The idea of “knowing more about [...] ethics” either presumes an objective ethics or merely points at you or where you wish you were.
The existence of moral disagreement is not an argument against CEV, unless all disagreeing parties know everything there is to know about their desires, and are perfect bayesians. Otherwise, people can be mistaken about what they really want, or what the facts prescribe (given their values).
‘Objective ethics’? ‘Merely points… at where you wish you were’? “Merely”!?
Take your most innate desires. Not ‘I like chocolate’ or ‘I ought to condemn murder’, but the most basic levels (go to a neuroscientist to figure those out). Then take the facts of the world. If you had a sufficiently powerful computer, and you could input the values and plug in the facts, then the output would be what you wanted to do best.
That doesn’t mean whichever urge is strongest, but it takes into account the desires that make up your conscience, and the bit of you saying ‘but that’s not what’s right’. If you could perform this calculation in your head, you’d get the feeling of ‘Yes, that’s what is right. What else could it possibly be? What else could possibly matter?’ This isn’t ‘merely’ where you wish you were. This is the ‘right’ place to be.
This reply is more about the meta-ethics, but for interpersonal ethics, please see my response to peter_hurford’s comment above.
The fact that people can be mistaken about what they really want is vanishingly small evidence that if they were not mistaken, they would find out they all want the same things.
A very common desire is to be more prosperous than one’s peers. It’s not clear to me that there is some “real” goal that this serves (for an individual) -- it could be literally a primary goal. If that’s the case, then we already have a problem: two people in a peer group cannot both get all they want if both want to have more than any other. I can’t think of any satisfactory solution to this. Now, one might say, “well, if they’d grown up farther together this would be solvable”, but I don’t see any reason that should be true. People don’t necessarily grow more altruistic as they “grow up”, so it seems that there might well be no CEV to arrive at. I think, actually, a weaker version of the UFAI problem exists here: sure, humans are more similar to each other than UFAI’s need be to each other, but they still seem fundamentally different in goal systems and ethical views, in many respects.
Objective? Sure, without being universal.
Human beings are physically/genetically/mentally similar within certain tolerances; this implies there is one system of ethics (within certain tolerances) that is best suited all of us, which could be objectively determined by a thorough and competent enough analysis of humans. The edges of the bell curve on various factors might have certain variances. There might be a multi-modal distribution of fit (bimodal on men and women, for example), too. But, basically, one objective ethics for humans.
This ethics would clearly be unsuited for cats, sharks, bees, or trees. It seems vanishingly unlikely that sapient minds from other evolutions would also be suited for such an ethics, either. So it’s not universal, it’s not a code God wrote into everything. It’s just the best way to be a human . . . as humans exposed to it would in fact judge, because it’s fitted to us better than any of our current fumbling attempts.
Why not include primates, dolphins, rats, chickens, etc. into the ethics?
What would that mean? How would the chicken learn or follow the ethics? Does it seem even remotely reasonable that social behavior among chickens and social behavior among humans should follow the same rules, given the inherent evolutionary differences in social structure and brain reward pathways?
It might be that CEV is impossible for humans, but there’s at least enough basic commonality to give it a chance of being possible.
Why would the chicken have to learn to follow the ethics in order for its interests to be fully included in the ethics? We don’t include cognitively normal human adults because they are able to understand and follow ethical rules (or, at the very least, we don’t include them only in virtue of that fact). We include them because to them as sentient beings, their subjective well-being matters. And thus we also include the many humans who are unable to understand and follow ethical rules. We ourselves, of course, would want to be still included in case we lost the ability to follow ethical rules. In other words: Moral agency is not necessary for the status of a moral patient, i.e. of a being that matters morally.
The question is how we should treat humans and chickens (i.e. whether and how our decision-making algorithm should take them and their interests into account), not what social behavior we find among humans and chickens.
Constructing an ethics that demands that a chicken act as a moral agent is obviously nonsense; chickens can’t and won’t act that way. Similarly, constructing an ethics that demands humans value chickens as much as they value their own children is nonsense; humans can’t and won’t act that way. If you’re constructing an ethics for humans follow, you have to start by figuring out humans.
It’s not until after you’ve figured out how much humans should value the interests of chickens that you can determine how much to weigh the interests of chickens in how humans should act. And how much humans should weigh the value of chickens is by necessity determined by what humans are.
Well, if humans can’t and won’t act that way, too bad for them! We should not model ethics after the inclinations of a particular type of agent, but we should instead try and modify all agents according to ethics.
If we did model ethics after particular types of agent, here’s what would result: Suppose it turns out that type A agents are sadistic racists. So what they should do is put sadistic racism into practice. Type B agents, on the other hand, are compassionate anti-racists. So what they should do is diametrically opposed to what type A agents should do. And we can’t morally compare types A and B.
But type B is obviously objectively better, and objectively less of a jerk. (Whether type A agents can be rationally motivated (or modified so as) to become more B-like is a different question.)
Of course we can morally compare types A and B, just as we can morally compare an AI whose goal is to turn the world into paperclips and one whose goal is to make people happy.
However, rather than “objectively better”, we could be more clear by saying “more in line with our morals” or some such. It’s not as if our morals came from nowhere, after all.
See also: “The Bedrock of Morality: Arbitrary?”
Just to make clear, are you saying that we should treat chickens how humans want to treat them, or how chickens do? Because if the former, then yeah, CEV can easily find out whether we’d want them to have good lives or not (and I think it would see we do).
But chickens don’t (I think) have much of an ethical system, and if we incorporated their values into what CEV calculates, then we’d be left with some important human values, but also a lot of chicken feed.
Thanks, Benito. Do we know that we shouldn’t have a lot of chicken feed? My point in asking this is just that we’re baking in a lot of the answer by choosing which minds we extrapolate in the first place. Now, I have no problem baking in answers—I want to bake in my answers—but I’m just highlighting that it’s not obvious that the set of human minds is the right one to extrapolate.
BTW, I think the “brain reward pathways” between humans and chickens aren’t that different. Maybe you were thinking about the particular, concrete stimuli that are found to be rewarding rather than the general architecture.
It does not imply that there exists even one basic moral/ethical statement any human being would agree with, and to me that seems to be a requirement for any kind of humanity-wide system of ethics. Your ‘one size fits all’ approach does not convince me, and your reasoning seems superficial and based on words rather than actual logic.
All humans as they currently exist, no. But is there a system of ethics as a whole that humans, even currently disagreeing with some parts of it, would recognize as superior at doing what they really want from an ethical system that they would switch to it? Even in the main? Maybe, indeed, human ethics are so dependent on alleles that vary within the population and chance environmental factors that CEV is impossible. But there’s no solid evidence to require assuming that a priori, either.
By analogy, consider the person who in 1900 wanted to put together the ideal human diet. Obviously, the diets in different parts of the world differed from each other extensively, and merely averaging all of them that existed in 1900 would not be particularly conducive to finding an actual ideal diet. The person would have to do all the sorts of research that discovered the roles of various nutrients and micronutrients, et cetera. Indeed, he’d have to learn more than we currently do about them. And he’d have to work out the variations to react to various medical conditions, and he’d have to consider flavor (both innate response pathways and learned ones), et cetera. And then there’s the limit of what foods can be grown where, what shipping technologies exist, how to approximate the ideal diet in differing circumstances.
It would be difficult, but eventually you probably could put together a dietary program (including understood variations) that would, indeed, suit humans better than any of the existing diets in 1900, both in nutrition and pleasure. It wouldn’t suit sharks at all; it would not be a universal nutrition. But it would be an objectively determined diet just the same.
The problem with this diet is that it wouldn’t be a diet; it would be many different diets. Lots of people are lactose intolerant and it would be stupid to remove dairy products from the diet of those who are not. Likewise, a vegetarian diet is not a “variation” of a non-vegetarian diet.
Also, why are you talking about 1900?
I think the fact that humans can’t agree on even the most basic issues is pretty solid evidence. Also, even if everyone had the same subjective ethics, this still would result in objective contradictions. I’m not aware of any evidence that this problem is solvable at all.
Not similar enough to prevent massive conflicts—historically.
Basically, small differences in optimisation targets can result in large conflicts.
And even more simply, if everyone has exactly the same optimization target “benefit myself at the expense of others”, then there’s a big conflict.
The existence of moral disagreement is not an argument against CEV, unless all disagreeing parties know everything there is to know about their desires, and are perfect bayesians. People can be mistaken about what they really want, or what the facts prescribe (given their values).
I linked to this above, but I don’t know if you’ve read it. Essentially, you’re explaining moral disagreement by positing massively improbable mutations, but it’s far more likely to be a combination of bad introspection and non-bayesian updating.
Um, different organisms of the same species typically have conflicting interests due to standard genetic diversity—not “massively improbable mutations”.
Typically, organism A acts as though it wants to populate the world with its offspring, and organism B acts as though it wants to populate the world with its offspring, and these goals often conflict—because A and B have non-identical genomes. Clearly, no “massively improbable mutations” are required in this explanation. This is pretty-much biology 101.
It’s very hard for A and B to know how much their genomes differ, because they can only observe each other’s phenotypes, and they can’t invest too much time in that either. So they will mostly compete even if their genomes happen to be identical.
The kin recognition that you mention may be tricky, but kin selection is much more widespread—because there are heuristics that allow organisms to favour their kin without the need to examine them closely—like: “be nice to your nestmates”.
Simple limited dispersal often results in organisms being surrounded by their close kin—and this is a pretty common state of affairs for plants and fungi.
Oops.
Yup, I missed something there.
Well, for humans, we’ve evolved desires that work interpersonally (fairness, desires for others’ happiness etc,). I think that an AI, which had our values written in, would have no problem figuring out what’s best for us. It would say ‘well, there’s is complex set of values, that sum up to everyone being treated well (or something), and so each party involved should be treated well.’
You’re right though, I hadn’t made clear idea about how this bit worked. Maybe this helps?