I’ve seen of people on Lesswrong taking cognitive structures that I consider to be biases as terminal values. Take risk aversion for example:
Risk Aversion
For a rational agent with goals that don’t include “being averse to risk”, risk aversion is a bias. The correct decision theory acts on expected utility, with utility of outcomes and probability of outcomes factored apart and calculated separately. Risk aversion does not factor them.
EDIT: There is some contention on this. Just substitute “that thing minimax algorithms do” for “risk aversion” in my writing. /EDIT
A while ago, I was working through the derivation of A* and minimax planning algorithms from a Bayesian and decision-theoretic base. When I was trying to understand the relationship between them, I realized that strong risk aversion, aka minimax, saves huge amounts of computation compared to the correct decision theory, and actually becomes more optimal as the environment becomes more influenced by rational opponents. The best way win is to deny the opponents any opportunity to weaken you. That’s why minimax is a good algorithm for chess.
Current theories about the origin of our intelligence say that we became smart to outsmart our opponents in complex social games. If our intelligence was built for adversarial games, I am not surprised at risk aversion.
A better theoretical replacement, and a plausible causal history for why we have the bias instead of the correct algorithm are convincing to me as an argument against risk aversion as a value the way a rectangular 13x7 pebble heap is convincing to a pebble sorter as an argument against the correctness of a heap of 91 pebbles; it seems undeniable, but I don’t have access to the hidden values that would say for sure.
And yet I’ve seen people on LW state that their “utility function” includes risk aversion. Because I don’t understand the values involved, all I can do is state the argument above and see if other people are as convinced as me.
It may seem silly to take a bias as terminal, but there are examples with similar arguments that are less clear-cut, and some that we take as uncontroversially terminal:
Responsibility and Identity
The feeling that you are responsible for some things and not others, like say, the safety of your family, but not people being tortured in Syria, seems noble and practical. But I take it to be a bias.
I’m no evolutionary psychologist, but it seems to me that feelings of responsibility are a quick hack to kick you into motion where you can affect the outcome and the utility at stake is large. For the most part, this aligns well with utilitarianism; you usually don’t feel responsible for things you can’t really affect, like people being tortured in Syria, or the color of the sky. You do feel responsible to pull a passed out kid off the train tracks, but maybe you don’t feel responsible to give them some fashion advice.
Responsibility seems to be built on identity, so it starts to go weird when you identify or don’t identify in ways that didn’t happen in the ancestral environment. Maybe you identify as a citizen of the USA, but not of Syria, so you feel shame and responsibility about the US torturing people, but the people being tortured in Syria are not your responsibility, even though both cases are terrible, and there is very little you can do about either. A proper utilitarian would feel approximately the same desire to do something about each, but our responsibility hack emphasizes responsibility for the actions of the tribe you identify with.
You might feel great responsibility to defend your past actions but not those other people, even tho neither is worth “defending”. A rational agent would learn from both the actions of their own past selves and those of other people without seeking to justify or condemn; they would update and move on. There is no tribal council that will exile you if you change your tune or don’t defend yourself.
You might be appalled that someone wishes to stop feeling responsibility for their past selves; “but if they don’t feel responsibility for their actions, what will prevent them from murdering people, or encourage them to do good?”. A rational utilitarian would do good and not do evil because they wish good and non-evil to be done, instead of because of feelings of responsibility that they don’t understand.
This argument is a little harder to see and possibly a little less convincing, but again I am convinced that identity and responsibility are inferior to utilitarianism, tho they may have seemed almost terminal.
Justice
Surely justice is a terminal value; it feels so noble to desire it. Again I consider the desire for justice to be a biased heuristic.
in game theory the best solution for iterated prisoners dilemma is tit-for-tat: cooperate and be nice, but punish defectors. Tit-for-tat looks a lot like our instincts for justice, and I’ve heard that the prisoners dilemma is a simplified analog of many of the situations that came up in the ancestral environment, so I am not surprised that we have an instinct for it.
It’s nice that we have a hardware implementation of tit-for-tat, but to the extent that we take it as terminal instead of instrumental-in-some-cases, it will make mistakes. It will work well when individuals might choose to defect from the group for greater personal gain, but what if we discover, for example, that some murders are not calculated defections, but failures of self control caused by a bad upbringing and lack of education. What if we then further discover that there is a two-month training course that has a high success rate of turning murderers into productive members of society. When Dan the Deadbeat kills his girlfriend, and the psychologists tell us he is a candidate for the rehab program, we can demand justice like we feel we ought to at a cost of hundreds of thousands of dollars and a good chunk of Dan’s life, or we can run Dan thru the two month training course for a few thousand dollars, transforming him into a good, normal person. People who take punishment of criminals as a terminal value will choose prison for Dan, but people with other interests would say rehab.
One problem with this story is that the two-month murder rehab seems wildly impossible, but so do all of Omega’s tricks. I think it’s good to stress our theories at the limits, they seem to come out stronger, even for normal cases.
I was feeling skeptical about some people’s approach to justice theory when I came up with this one, so I was open to changing my understanding of justice. I am now convinced that justice and punishment instincts are instrumental, and only approximations of the correct game theory and utilitarianism. The problem is, while I was convinced, someone who takes justice as terminal, and is not open to the idea that it might be wrong, is absolutely not convinced. They will say “I don’t care if it is more expensive, or that you have come up with something that ‘works better’, it is our responsibility to the criminal to punish them for their misdeeds.”. Part of the reason for this post is that I don’t know what to say to this. All I can do is state the argument that convinced me, ask if they have something to protect, and feel like I’m arguing with a rock.
Before anyone who is still with me gets enthusiastic about the idea that knowing a causal history and an instrumentally better way is enough to turn a value into a bias, consider the following:
Love, Friendship, and Flowers
See the gift we give to tomorrow. That post contains plausible histories for why we ended up with nice things like love, friendship, and beauty; and hints that could lead you to ‘better’ replacements made out of game theory and decision theory.
Unlike the other examples, where I felt a great “Aha!” and decided to use the superior replacements when appropriate, this time I feel scared. I thought I had it all locked out, but I’ve found some existential angst lurking in the basement.
Love and such seem like something to protect, like I don’t care if there are better solutions to the problem they were built to solve; I don’t care if game theory and decision theory leads to more optimal replication. If I’m worried that love will go away, then there’s no reason I ought to let it, but these are the same arguments as the people who think justice is terminal. What is the difference that makes it right this time?
Worrying and Conclusion
One answer to this riddle is that everyone is right with respect to themselves, and there’s nothing we can do about disagreements. There’s nothing someone who has one interpretation can say to another to justify their values against some objective standard. By the full power of my current understanding, I’m right, but so is someone who disagrees.
On the other hand, maybe we can do some big million-variable optimization on the contradictory values and heuristics that make up ourselves and come to a reflectively coherent understanding of which are values and which are biases. Maybe none of them have to be biases; it makes sense and seems acceptable that sometimes we will have to go against one of our values for greater gain in another. Maybe I’m asking the wrong question.
I’m confused, what does LW think?
Solution
I was confused about this for a while; is it just something that we have to (Gasp!) agree to disagree about? Do we have to do a big analysis to decide once and for all which are “biases” and which are “values”? My favored solution is to dissolve the distinction between biases and values:
All our neat little mechanisms and heuristics make up our values, but they come on a continuum of importance, and some of them sabotage the rest more than others.
For example, all those nice things like love and beauty seem very important, and usually don’t conflict, so they are closer to values.
Things like risk aversion and hindsight bias and such aren’t terribly important, but because they prescribe otherwise stupid behavior in the decision theory/epistemological realm, they sabotage the achievement of other bias/values, and are therefore a net negative.
This can work for the high-value things like love and beauty and freedom as well: Say you are designing a machine that will achieve many of your values, being biased towards making it beautiful over functional could sabotage achievement of other values. Being biased against having powerful agents interfering with freedom can prevent you from accepting law or safety.
So debiasing is knowing how and when to override less important “values” for the sake of more important ones, like overriding your aversion to cold calculation to maximize lives saved in a shut up and multiply situation.
Terminal Bias
I’ve seen of people on Lesswrong taking cognitive structures that I consider to be biases as terminal values. Take risk aversion for example:
Risk Aversion
For a rational agent with goals that don’t include “being averse to risk”, risk aversion is a bias. The correct decision theory acts on expected utility, with utility of outcomes and probability of outcomes factored apart and calculated separately. Risk aversion does not factor them.
EDIT: There is some contention on this. Just substitute “that thing minimax algorithms do” for “risk aversion” in my writing. /EDIT
A while ago, I was working through the derivation of A* and minimax planning algorithms from a Bayesian and decision-theoretic base. When I was trying to understand the relationship between them, I realized that strong risk aversion, aka minimax, saves huge amounts of computation compared to the correct decision theory, and actually becomes more optimal as the environment becomes more influenced by rational opponents. The best way win is to deny the opponents any opportunity to weaken you. That’s why minimax is a good algorithm for chess.
Current theories about the origin of our intelligence say that we became smart to outsmart our opponents in complex social games. If our intelligence was built for adversarial games, I am not surprised at risk aversion.
A better theoretical replacement, and a plausible causal history for why we have the bias instead of the correct algorithm are convincing to me as an argument against risk aversion as a value the way a rectangular 13x7 pebble heap is convincing to a pebble sorter as an argument against the correctness of a heap of 91 pebbles; it seems undeniable, but I don’t have access to the hidden values that would say for sure.
And yet I’ve seen people on LW state that their “utility function” includes risk aversion. Because I don’t understand the values involved, all I can do is state the argument above and see if other people are as convinced as me.
It may seem silly to take a bias as terminal, but there are examples with similar arguments that are less clear-cut, and some that we take as uncontroversially terminal:
Responsibility and Identity
The feeling that you are responsible for some things and not others, like say, the safety of your family, but not people being tortured in Syria, seems noble and practical. But I take it to be a bias.
I’m no evolutionary psychologist, but it seems to me that feelings of responsibility are a quick hack to kick you into motion where you can affect the outcome and the utility at stake is large. For the most part, this aligns well with utilitarianism; you usually don’t feel responsible for things you can’t really affect, like people being tortured in Syria, or the color of the sky. You do feel responsible to pull a passed out kid off the train tracks, but maybe you don’t feel responsible to give them some fashion advice.
Responsibility seems to be built on identity, so it starts to go weird when you identify or don’t identify in ways that didn’t happen in the ancestral environment. Maybe you identify as a citizen of the USA, but not of Syria, so you feel shame and responsibility about the US torturing people, but the people being tortured in Syria are not your responsibility, even though both cases are terrible, and there is very little you can do about either. A proper utilitarian would feel approximately the same desire to do something about each, but our responsibility hack emphasizes responsibility for the actions of the tribe you identify with.
You might feel great responsibility to defend your past actions but not those other people, even tho neither is worth “defending”. A rational agent would learn from both the actions of their own past selves and those of other people without seeking to justify or condemn; they would update and move on. There is no tribal council that will exile you if you change your tune or don’t defend yourself.
You might be appalled that someone wishes to stop feeling responsibility for their past selves; “but if they don’t feel responsibility for their actions, what will prevent them from murdering people, or encourage them to do good?”. A rational utilitarian would do good and not do evil because they wish good and non-evil to be done, instead of because of feelings of responsibility that they don’t understand.
This argument is a little harder to see and possibly a little less convincing, but again I am convinced that identity and responsibility are inferior to utilitarianism, tho they may have seemed almost terminal.
Justice
Surely justice is a terminal value; it feels so noble to desire it. Again I consider the desire for justice to be a biased heuristic.
in game theory the best solution for iterated prisoners dilemma is tit-for-tat: cooperate and be nice, but punish defectors. Tit-for-tat looks a lot like our instincts for justice, and I’ve heard that the prisoners dilemma is a simplified analog of many of the situations that came up in the ancestral environment, so I am not surprised that we have an instinct for it.
It’s nice that we have a hardware implementation of tit-for-tat, but to the extent that we take it as terminal instead of instrumental-in-some-cases, it will make mistakes. It will work well when individuals might choose to defect from the group for greater personal gain, but what if we discover, for example, that some murders are not calculated defections, but failures of self control caused by a bad upbringing and lack of education. What if we then further discover that there is a two-month training course that has a high success rate of turning murderers into productive members of society. When Dan the Deadbeat kills his girlfriend, and the psychologists tell us he is a candidate for the rehab program, we can demand justice like we feel we ought to at a cost of hundreds of thousands of dollars and a good chunk of Dan’s life, or we can run Dan thru the two month training course for a few thousand dollars, transforming him into a good, normal person. People who take punishment of criminals as a terminal value will choose prison for Dan, but people with other interests would say rehab.
One problem with this story is that the two-month murder rehab seems wildly impossible, but so do all of Omega’s tricks. I think it’s good to stress our theories at the limits, they seem to come out stronger, even for normal cases.
I was feeling skeptical about some people’s approach to justice theory when I came up with this one, so I was open to changing my understanding of justice. I am now convinced that justice and punishment instincts are instrumental, and only approximations of the correct game theory and utilitarianism. The problem is, while I was convinced, someone who takes justice as terminal, and is not open to the idea that it might be wrong, is absolutely not convinced. They will say “I don’t care if it is more expensive, or that you have come up with something that ‘works better’, it is our responsibility to the criminal to punish them for their misdeeds.”. Part of the reason for this post is that I don’t know what to say to this. All I can do is state the argument that convinced me, ask if they have something to protect, and feel like I’m arguing with a rock.
Before anyone who is still with me gets enthusiastic about the idea that knowing a causal history and an instrumentally better way is enough to turn a value into a bias, consider the following:
Love, Friendship, and Flowers
See the gift we give to tomorrow. That post contains plausible histories for why we ended up with nice things like love, friendship, and beauty; and hints that could lead you to ‘better’ replacements made out of game theory and decision theory.
Unlike the other examples, where I felt a great “Aha!” and decided to use the superior replacements when appropriate, this time I feel scared. I thought I had it all locked out, but I’ve found some existential angst lurking in the basement.
Love and such seem like something to protect, like I don’t care if there are better solutions to the problem they were built to solve; I don’t care if game theory and decision theory leads to more optimal replication. If I’m worried that love will go away, then there’s no reason I ought to let it, but these are the same arguments as the people who think justice is terminal. What is the difference that makes it right this time?
Worrying and ConclusionOne answer to this riddle is that everyone is right with respect to themselves, and there’s nothing we can do about disagreements. There’s nothing someone who has one interpretation can say to another to justify their values against some objective standard. By thefull power of my current understanding, I’m right, but so is someone who disagrees.On the other hand, maybe we can do some big million-variable optimization on the contradictory values and heuristics that make up ourselves and come to a reflectively coherent understanding of which are values and which are biases. Maybe none of them have to be biases; it makes sense and seems acceptable that sometimes we will have to go against one of our values for greater gain in another. Maybe I’m asking the wrong question.I’m confused, what does LW think?Solution
I was confused about this for a while; is it just something that we have to (Gasp!) agree to disagree about? Do we have to do a big analysis to decide once and for all which are “biases” and which are “values”? My favored solution is to dissolve the distinction between biases and values:
All our neat little mechanisms and heuristics make up our values, but they come on a continuum of importance, and some of them sabotage the rest more than others.
For example, all those nice things like love and beauty seem very important, and usually don’t conflict, so they are closer to values.
Things like risk aversion and hindsight bias and such aren’t terribly important, but because they prescribe otherwise stupid behavior in the decision theory/epistemological realm, they sabotage the achievement of other bias/values, and are therefore a net negative.
This can work for the high-value things like love and beauty and freedom as well: Say you are designing a machine that will achieve many of your values, being biased towards making it beautiful over functional could sabotage achievement of other values. Being biased against having powerful agents interfering with freedom can prevent you from accepting law or safety.
So debiasing is knowing how and when to override less important “values” for the sake of more important ones, like overriding your aversion to cold calculation to maximize lives saved in a shut up and multiply situation.