I appreciate the time you’ve put into our discussion and agree it may be highly relevant. So far, it looks like each of us has misinterpreted the other to be proposing something they are actually not proposing, unfortunately. Let’s see if we can clear it up.
First, I’m relieved that neither of us is proposing to inform AI behavior with people’s shared preferences.
This is the discussion of a post about the dangers of terminology, in which I’ve recommended “AI Friendliness” as an alternative to “AI Goalcraft” (see separate comment), because I think unconditional friendliness toward all beings is a good target for AI. Your suggestion is different:
About terminology, it seems to me that what I call preference aggregation, outer alignment, and goalcraft mean similar things [...] I’d vote for using preference aggregation
I found it odd that you would suggest naming the AI Goalcraft domain “Preference Aggregation” after saying earlier that you are only “slightly more positive” about aggregating human preferences than you are about “terrible ideas” like controlling power according to utilitarianism or a random person. Thanks for clarifying:
I don’t think we should aim to guide AI behavior using shared preferences.
Neither do I, and for this reason I strongly oppose your recommendation to use the term “preference aggregation” for the entire field of AI goalcraft. While preference aggregation may be a useful tool in the kit and I remain interested in related proposals, it is far too specific, and it’s only slightly better than terrible as a way to craft goals or guide power.
there aren’t enough obvious, widely shared preferences for us to guide the AI with.
This is where I think the obvious and widely shared preference to be happy and not suffer could be relevant to the discussion. However, my claim is that happiness is the optimization target of people, not that we should specify it as the optimization target of AI. We do what we do to be happy. Our efforts are not always successful, because we also struggle with evolved habits like greed and anger and our instrumental preferences aren’t always well informed.
You want an ASI to optimize everyone’s happiness, right?
No. We’re fully capable of optimizing our own happiness. I agree that we don’t want a world where AI force-feeds everyone MDMA or invades brains with nanobots. A good friend helps you however they can and wishes you “happy holidays” sincerely. That doesn’t mean they take it upon themselves to externally measure your happiness and forcibly optimize it. The friend understands that your happiness is truly known only to you and is a result of your intentions, not theirs.
I think happiness/sadness is a signal that evolution has given us for a reason. We tend to do what makes us happy, because evolution thinks it’s best for us. (“Best” is again debatable, I don’t say everyone should function at max evolution). If we remove sadness, we lose this signal. I think that will mean that we don’t know what to do anymore, perhaps become extremely passive.
Pain and pleasure can be useful signals in many situations. But to your point about it not being best to function at max evolution: our evolved tendency to greedily crave pleasure and try to cling to it causes unnecessary suffering. A person can remain happy regardless of whether a particular sensation is pleasurable, painful, or neither. Stubbing your toe or getting cut off in traffic is bad enough; much worse is to get furious about it and ruin your morning. A bite of cake is even more enjoyable if you’re not upset that it’s the last one of the serving. Removing sadness does not remove the signal. It just means you have stopped relating to the signal in an unrealistic way.
If someone wants to do this on an individual level (enlightenment? drug abuse? netflix binging?), be my guest
Drug abuse and Netflix-binging are examples of the misguided attempt to cling to pleasurable sensations I mentioned above. There’s no eternal cake, so the question of whether it would be good for a person to eat eternal cake is nonsensical. Any attempt to eat eternal cake is based on ignorance and cannot succeed; it just leads to dissatisfaction and a sugar habit. Your other example—enlightenment—has to do with understanding this and letting go of desires that cannot be satisfied, like the desire for there to be a permanent self. Rather than leading to extreme passivity, benefits of this include freeing up a lot of energy and brain cycles.
With all due respect, I don’t think it’s up to you—or anyone—to say who’s ethically confused and who isn’t. I know you don’t mean it in this way, but it reminds me of e.g. communist re-education camps.
This is a delicate topic, and I do not claim to be among the wisest living humans. But there is such a thing as mental illness, and there is such a thing as mental health. Basic insights like “happiness is better than suffering” and “harm is bad” are sufficiently self-evident to be useful axioms. If we can’t even say that much with confidence, what’s left to say or teach AI about ethics?
Probably our disagreement here stems directly from our different ethical positions: I’m an ethical relativist, you’re a utilitarian, I presume.
No, my view is that deontology leads to the best results, if I had to pick a single framework. However, I think many frameworks can be helpful in different contexts and they tend to overlap.
I do think it’s valuable to point out that lots of people outside LW/EA have different value systems (and just practical preferences) and I don’t think it’s ok to force different values/preferences on them with an ASI.
Absolutely!
I think you should not underestimate how much “forcing upon” there is in powerful tech.
A very important point. Many people’s instrumental preferences today are already strongly influenced by AI, such as recommender and ranking algorithms that train people to be more predictable by preying on our evolved tendencies for lust and hatred—patterns that cause genes to survive while reducing well-being within lived experience. More powerful AI should impinge less on clarity of thought and capacity for decision-making than current implementations, not more.
I appreciate the time you’ve put into our discussion and agree it may be highly relevant. So far, it looks like each of us has misinterpreted the other to be proposing something they are actually not proposing, unfortunately. Let’s see if we can clear it up.
First, I’m relieved that neither of us is proposing to inform AI behavior with people’s shared preferences.
This is the discussion of a post about the dangers of terminology, in which I’ve recommended “AI Friendliness” as an alternative to “AI Goalcraft” (see separate comment), because I think unconditional friendliness toward all beings is a good target for AI. Your suggestion is different:
I found it odd that you would suggest naming the AI Goalcraft domain “Preference Aggregation” after saying earlier that you are only “slightly more positive” about aggregating human preferences than you are about “terrible ideas” like controlling power according to utilitarianism or a random person. Thanks for clarifying:
Neither do I, and for this reason I strongly oppose your recommendation to use the term “preference aggregation” for the entire field of AI goalcraft. While preference aggregation may be a useful tool in the kit and I remain interested in related proposals, it is far too specific, and it’s only slightly better than terrible as a way to craft goals or guide power.
This is where I think the obvious and widely shared preference to be happy and not suffer could be relevant to the discussion. However, my claim is that happiness is the optimization target of people, not that we should specify it as the optimization target of AI. We do what we do to be happy. Our efforts are not always successful, because we also struggle with evolved habits like greed and anger and our instrumental preferences aren’t always well informed.
No. We’re fully capable of optimizing our own happiness. I agree that we don’t want a world where AI force-feeds everyone MDMA or invades brains with nanobots. A good friend helps you however they can and wishes you “happy holidays” sincerely. That doesn’t mean they take it upon themselves to externally measure your happiness and forcibly optimize it. The friend understands that your happiness is truly known only to you and is a result of your intentions, not theirs.
Pain and pleasure can be useful signals in many situations. But to your point about it not being best to function at max evolution: our evolved tendency to greedily crave pleasure and try to cling to it causes unnecessary suffering. A person can remain happy regardless of whether a particular sensation is pleasurable, painful, or neither. Stubbing your toe or getting cut off in traffic is bad enough; much worse is to get furious about it and ruin your morning. A bite of cake is even more enjoyable if you’re not upset that it’s the last one of the serving. Removing sadness does not remove the signal. It just means you have stopped relating to the signal in an unrealistic way.
Drug abuse and Netflix-binging are examples of the misguided attempt to cling to pleasurable sensations I mentioned above. There’s no eternal cake, so the question of whether it would be good for a person to eat eternal cake is nonsensical. Any attempt to eat eternal cake is based on ignorance and cannot succeed; it just leads to dissatisfaction and a sugar habit. Your other example—enlightenment—has to do with understanding this and letting go of desires that cannot be satisfied, like the desire for there to be a permanent self. Rather than leading to extreme passivity, benefits of this include freeing up a lot of energy and brain cycles.
This is a delicate topic, and I do not claim to be among the wisest living humans. But there is such a thing as mental illness, and there is such a thing as mental health. Basic insights like “happiness is better than suffering” and “harm is bad” are sufficiently self-evident to be useful axioms. If we can’t even say that much with confidence, what’s left to say or teach AI about ethics?
No, my view is that deontology leads to the best results, if I had to pick a single framework. However, I think many frameworks can be helpful in different contexts and they tend to overlap.
Absolutely!
A very important point. Many people’s instrumental preferences today are already strongly influenced by AI, such as recommender and ranking algorithms that train people to be more predictable by preying on our evolved tendencies for lust and hatred—patterns that cause genes to survive while reducing well-being within lived experience. More powerful AI should impinge less on clarity of thought and capacity for decision-making than current implementations, not more.