The first forbidden transition would be the very first one, of course—it would be a heck of a coincidence to get the first few steps right but not know what you’re doing.
This is just guessing, but it seems like “more altruism” is the sort of thing one thinks one should say, while not actually being specific enough to preserve your values. This goes to leplen’s point: there isn’t any single direction of improvement called “more altruism.”
Asking for more altruism via some specific, well-understood mechanism might at least illuminate the flaws.
Very good point that I think clarified this for me.
Per Wikipedia, “Altruism or selflessness is the principle or practice of concern for the welfare of others.” That seems like a plausible definition, and I think it illustrates what’s wrong with this whole chain. The issue here is not increasing concern or practice but expanding the definition of “others”; that is, bringing more people/animals/objects into the realm of concern. So if we taboo altruism, the question becomes to whom/what and to what degree should we practice concern. Furthermore, on what grounds should we do this?
For instance, if the real principle is to increase pleasure and avoid pain, then we should have concern for humans and higher animals, but not care about viruses, plants, or rocks. (I’m not saying that’s the right fundamental principle; just an example that makes it clearer where to draw the line.)
In other words, altruism is not a good in itself. It needs a grounding in something else. If the grounding principle were something like “Increase the status and success of my tribe”, then altruistic behavior could be very negative for other tribes.
One thing maybe worth looking at is the attractor set of the CEV process. If the attractor set is small, this means the final outcome is determined more by the CEV process than the initial values.
It seems perfectly plausible to me that there might be many fewer satisfactory endpoints than starting points. In most optimization processes, there’s at most a discrete set of acceptable endpoints, even when there are uncountably infinitely many possible places to start.
Why would it indicate a flaw in CEV if the same turned out to be true there?
A: Simulate an argument between the individual at State 1 and State 7. If the individual at State 1 is ultimately convinced, then State 7 is CEV whatever the real State 1 individual thinks. If the individual at State 1 is ultimately unconvinced, it isn’t.
If, say, the individual at State is convinced by State 4′s values but not by State 7′s values (arbitrary choices), then it is extrapolated CEV up to the point where the individual at State 1 would cease to be convinced by the argument even seeing the logical connections.
B: Simulate an argument between the individual at State 1 and the individual at State 7, under the assumption that the individual at State 1 and the individual at State 7 both perfectly follow their own rules for proper argument incorporating appropriate amount of emotion and rationality (by their subjective standards) and getting rid of what they consider to be undue biases. Same rule for further interpretation.
Simulate an argument between the individual at State 1 and State 7. If the individual at State 1 is ultimately convinced
Does this include “convinced by hypnosis”, “convinced by brainwashing”, “convinced by a clever manipulation” etc.? How will AI tell the difference?
(Maybe “convincing by hypnosis” is considered a standard and ethical method of communication with lesser beings in the society of Stage 7. If a person A is provably more intelligent and rational than a person B, and a person A acts according to generally accepted ethical values, why not make the communication most efficient? To do otherwise would be a waste of resources, which is a crime, if the wasted resources could be instead spent on saving people’s lives.)
under the assumption that the individual at State 1 and the individual at State 7 both perfectly follow their own rules for proper argument
On point 1- OK, I screwed up slightly. Neither individual is allowed to argue with the other in a manner which the other one would see as brainwashing or unfair manipulation if in possesion of all the facts. The system rules out anything deceptive by correcting both parties on anything that is a question of fact.
On point 2- Then they both argue using their own rules of argument. Presumably, the individual at State 1 is unconvinced.
Neither individual is allowed to argue with the other in a manner which the other one would see as brainwashing or unfair manipulation if in possesion of all the facts.
Presumably this means “all the morally relevant facts,” since giving State 1 “all the facts” would be isomorphic to presenting him with the argument-simulation. But determining all the morally relevant facts is a big part of the problem statement. If the AI could determine which aspects of which actions were morally relevant, and to what the degree and sign of that moral valence was, it wouldn’t need CEV.
We could lock down the argument more, just to be safe.
I’m not sure whether a text-only channel between State 1 and State 7, allowing only if-then type statements with a probability attached, would allow brainwashing or hypnosis. But I’m also not sure how many State 1 racists would be convinced that racism is unethical, over such a channel.
How about the individual versions at State 1 and State 7 both get all the facts that they consider relevant themselves? And maybe a State 1 racist really wouldn’t have CEV towards non-racism- we just have to accept that.
Wait a minute, I’m confused. I thought CEV meant something closer to “what we would want do if we were much smarter”. What Stuart suggests sounds more like “what we think we want now, executed by someone much smarter”, i.e. basically the overly-literal genie problem.
But your answer seems to suggest… well, I’m not sure I get what you mean exactly, but it doesn’t sound like you’re pointing to that distinction. What am I missing?
Is that “what we would want if we were more the person we wanted to be”, or “what we would want if we were more the person a much smarter version of us would want to be”? (My understanding of CEV leans towards the latter, and I think your problem is an instance of the former.)
I’m not sure the two are different in any meaningful way. There person we want to be today isn’t well defined—it takes a smarter intelligence to unwind (CEV) our motivations enough to figure out what we mean by “the person we wanted to be.”
I show the sequence to the AI and say, “CEV shouldn’t work like this—this is a negative example of CEV.”
“Example registered,” says the young AI. “Supplementary query: Identify first forbidden transition, state general rule prohibiting it?”
“Sorry AI, I’m not smart enough to answer that. Can you make me a little smarter?”
“No problem. State general rule for determining which upgrade methods are safe?”
″...Damn.”
How about “take the Small Accretions objection in consideration” It’s objection 3a here
The first forbidden transition would be the very first one, of course—it would be a heck of a coincidence to get the first few steps right but not know what you’re doing.
This is just guessing, but it seems like “more altruism” is the sort of thing one thinks one should say, while not actually being specific enough to preserve your values. This goes to leplen’s point: there isn’t any single direction of improvement called “more altruism.”
Asking for more altruism via some specific, well-understood mechanism might at least illuminate the flaws.
The general rule could be: Don’t let your applause lights generalize automatically.
Just because “altruism” is an applause light, it does not mean we should optimize the universe to be altruistic towards rocks.
The last forbidden transition would be the very last one, since it’s outright wrong while the previous ones do seem to have reasons behind them.
Very good point that I think clarified this for me.
Per Wikipedia, “Altruism or selflessness is the principle or practice of concern for the welfare of others.” That seems like a plausible definition, and I think it illustrates what’s wrong with this whole chain. The issue here is not increasing concern or practice but expanding the definition of “others”; that is, bringing more people/animals/objects into the realm of concern. So if we taboo altruism, the question becomes to whom/what and to what degree should we practice concern. Furthermore, on what grounds should we do this?
For instance, if the real principle is to increase pleasure and avoid pain, then we should have concern for humans and higher animals, but not care about viruses, plants, or rocks. (I’m not saying that’s the right fundamental principle; just an example that makes it clearer where to draw the line.)
In other words, altruism is not a good in itself. It needs a grounding in something else. If the grounding principle were something like “Increase the status and success of my tribe”, then altruistic behavior could be very negative for other tribes.
One thing maybe worth looking at is the attractor set of the CEV process. If the attractor set is small, this means the final outcome is determined more by the CEV process than the initial values.
Or maybe it means that objective morality exists. You never know :-)
Suppose ten trillion moral starting points, a thousand attractors. Then moral realism is certainly wrong, but the process is clearly flawed.
Really? Why?
It seems perfectly plausible to me that there might be many fewer satisfactory endpoints than starting points. In most optimization processes, there’s at most a discrete set of acceptable endpoints, even when there are uncountably infinitely many possible places to start.
Why would it indicate a flaw in CEV if the same turned out to be true there?
I think his issue is that there are multiple attractors.
I agree, though perhaps morality could be disjunctive.
Suggestions for possible general rule:
A: Simulate an argument between the individual at State 1 and State 7. If the individual at State 1 is ultimately convinced, then State 7 is CEV whatever the real State 1 individual thinks. If the individual at State 1 is ultimately unconvinced, it isn’t.
If, say, the individual at State is convinced by State 4′s values but not by State 7′s values (arbitrary choices), then it is extrapolated CEV up to the point where the individual at State 1 would cease to be convinced by the argument even seeing the logical connections.
B: Simulate an argument between the individual at State 1 and the individual at State 7, under the assumption that the individual at State 1 and the individual at State 7 both perfectly follow their own rules for proper argument incorporating appropriate amount of emotion and rationality (by their subjective standards) and getting rid of what they consider to be undue biases. Same rule for further interpretation.
Does this include “convinced by hypnosis”, “convinced by brainwashing”, “convinced by a clever manipulation” etc.? How will AI tell the difference?
(Maybe “convincing by hypnosis” is considered a standard and ethical method of communication with lesser beings in the society of Stage 7. If a person A is provably more intelligent and rational than a person B, and a person A acts according to generally accepted ethical values, why not make the communication most efficient? To do otherwise would be a waste of resources, which is a crime, if the wasted resources could be instead spent on saving people’s lives.)
What if the rules are incompatible?
On point 1- OK, I screwed up slightly. Neither individual is allowed to argue with the other in a manner which the other one would see as brainwashing or unfair manipulation if in possesion of all the facts. The system rules out anything deceptive by correcting both parties on anything that is a question of fact. On point 2- Then they both argue using their own rules of argument. Presumably, the individual at State 1 is unconvinced.
Presumably this means “all the morally relevant facts,” since giving State 1 “all the facts” would be isomorphic to presenting him with the argument-simulation. But determining all the morally relevant facts is a big part of the problem statement. If the AI could determine which aspects of which actions were morally relevant, and to what the degree and sign of that moral valence was, it wouldn’t need CEV.
We could lock down the argument more, just to be safe.
I’m not sure whether a text-only channel between State 1 and State 7, allowing only if-then type statements with a probability attached, would allow brainwashing or hypnosis. But I’m also not sure how many State 1 racists would be convinced that racism is unethical, over such a channel.
How about the individual versions at State 1 and State 7 both get all the facts that they consider relevant themselves? And maybe a State 1 racist really wouldn’t have CEV towards non-racism- we just have to accept that.
Wait a minute, I’m confused. I thought CEV meant something closer to “what we would want do if we were much smarter”. What Stuart suggests sounds more like “what we think we want now, executed by someone much smarter”, i.e. basically the overly-literal genie problem.
But your answer seems to suggest… well, I’m not sure I get what you mean exactly, but it doesn’t sound like you’re pointing to that distinction. What am I missing?
Also, what we would want if we were more the person we wanted to be.
Is that “what we would want if we were more the person we wanted to be”, or “what we would want if we were more the person a much smarter version of us would want to be”? (My understanding of CEV leans towards the latter, and I think your problem is an instance of the former.)
I’m not sure the two are different in any meaningful way. There person we want to be today isn’t well defined—it takes a smarter intelligence to unwind (CEV) our motivations enough to figure out what we mean by “the person we wanted to be.”
Altruism isn’t a binary.
“He decided to increase his altruism equally towards all sentient creatures.”
Equality of “altruism” is impossible. Transition forbidden.