Preferences may be arational, but they’re not completely arbitrary. In moral philosophy there are still arguments for what one’s preferences should be, even if they are generally much weaker than the arguments in rationality. Different interpretations influence what kinds of arguments apply or make sense to you, and therefore influence your preferences.
How can there be arguments about what preferences should be? Aren’t they, well, a sort of unmoved mover, a primal cause? (To use some erstwhile philosophical terms :-)
I can understand meta-arguments that say your preferences should be consistent in some sense, or that argue about subgoal preferences given some supergoals. But even under strict constraints of that kind, you have a lot of latitude, from humans to paperclip maximizers on out. Within that range, does interpreting probabilities differently really give you extra power you can’t get by finetuning your prefs?
Edit: the reason I’d perfer editing prefs is that talking about the Meaning of Probabilities sets off my materialism sensors. It leads to things like multiple-world theories because they’re easy to think about as an inetrpretation of QM, regardless of whether they actually exist. Then they can actually negatively affect our prefs or behavior.
Well, I don’t know what many of my preferences should be. How can I find out except by looking for and listening to arguments?
That implies there’s some objectively-definable standard for preferences which you’ll be able to recognize once you see it. Also, it begs the question of what in your current preferences says “I have to go out and get some more/different preferences!” From a goal-driven intelligence’s POV, asking others to modify your prefs in unspecified ways is pretty much the anti-rational act.
I think we need to distinguish between what a rational agent should do, and what a non-rational human should do to become more rational. Nesov’s reply to you also concerns the former, I think, but I’m more interested in the latter here.
Unlike a rational agent, we don’t have well-defined preferences, and the preferences that we think we have can be changed by arguments. What to do about this situation? Should we stop thinking up or listening to arguments, and just fill in the fuzzy parts of our preferences with randomness or indifference, in order to emulate a rational agent in the most direct manner possible? That doesn’t make much sense to me.
I’m not sure what we should do exactly, but whatever it is, it seems like arguments must make up a large part of it.
That arguments modify preference means that you are (denotationally) arriving at different preferences depending on arguments. This means that, from the perspective of a specific given preference (or “true” neutral preference not biased by specific arguments), you fail to obtain optimal rational decision algorithm, and thus to achieve high-preference strategy. But at the same time, “absence of action” is also an action, so not exploring the arguments may as well be a worse choice, since you won’t be moving forward towards more clear understanding of your own preference, even if the preference that you are going to understand will be somewhat biased compared to the unknown original one.
Thus, there is a tradeoff:
Irrational perception of arguments leads to modification of preference, which is bad for original preference, but
Considering moral arguments leads to a more clear understanding of some preference close to the original one, which allows to make more rational decisions, which is good for the original preference.
I think we shouldn’t try to emulate rational agents at all, in the sense that we shouldn’t pretend to have rationality-style preferences and supergoals; as a matter of fact we don’t have them.
Up to here we seem to agree, we just use different terminology. I just don’t want to conflate rational preferences with human preferences because they the two systems behave very differently.
Just as an example, in signalling theories of behaviour, you may consciously believe that your preferences are very different from what your behaviour is actually optimizing for when noone is looking. A rational agent wouldn’t normally have separate conscious/unconscious minds unless only the conscious part was sbuject to outside inspection. In this example, it makes sense to update signalling-preferences sometimes, because they’re not your actual acting-preferences.
But if you consciously intend to act out your (conscious) preferences, and also intend to keep changing them in not-always-foreseeable ways, then that isn’t rationality, and when there could be confusion due to context (such as on LW most of the time) I’d prefer not to use the term “preferences” about humans, or to make clear what is meant.
As an example, consider the arguments in form of proofs/disproofs of the statements that you are interested in. Information doesn’t necessarily “change” or “determine arbitrarily” the things you take from it, it may help you to compute an object in which you are already interested, without changing that object, and at the same time be essential in moving forward. If you have an algorithm, it doesn’t mean that you know what this algorithm will give you in the end, what the algorithm “means”. Resist the illusion of transparency.
I don’t understand what you’re saying as applied to this argument. That Wei Dai has an algorithm for modifying his preferences and he doesn’t know what the end output of that algorithm will be?
There will always be something about preference that you don’t know, and it’s not the question of modifying preference, it’s a question of figuring out what the fixed unmodifiable preference implies. Modifying preference is exactly the wrong way of going about this.
If we figure out the conceptual issues of FAI, we’d basically have the algorithm that is our preferences, but not in infinite and unknowable normal “execution trace” denotational “form”.
As Wei says below, we should consider rational agents (who have explicit preferences separate from the rest of their cognitive architecture) separately from humans who want to approximate that in some ways.
I think that if we first define separate preferences, and then proceed to modify them over and over again, this is so different from rational agents that we shouldn’t call it preferences at all. We can talk about e.g. morals instead, or about habits, or biases.
On the other hand if we define human preferences as ‘whatever human behavior happens to optimize’, then there’s nothing interesting about changing our preferences, this is something that happens all the time whether we want it to or not. Under this definition Wei’s statement that he deliberately makes it happen is unclear (the totality of a human’s behaviour, knowledge, etc. is subtly changing over time in any case) so I assumed he was using the former definition.
There is no clear-cut dichotomy between defining something completely at the beginning and doing things arbitrarily as we go. Instead of defining preference for rational agents, in a complete, finished form, and then seeing what happens, consider a process of figuring out what preference is. This is neither a way to arrive at the final answer, at any point, nor a history of observing of “whatever happens”. Rational agent is an impossible construct, but something irrational agents aspire to be, never obtaining. What they want to become isn’t directly related to what they “appear” to strive towards.
I understand. So you’re saying we should indeed use the term ‘preference’ for humans (and a lot of other agents) because no really rational agents can exist.
Actually, why is this true? I don’t know about perfect rationality, but why shouldn’t an agent exist whose preferences are completely specified and unchanging?
I understand. So you’re saying we should indeed use the term ‘preference’ for humans (and a lot of other agents) because no really rational agents can exist.
Right. Except that really rational agents might exist, but not if their preferences are powerful enough, as humans’ have every chance to be. And whatever we irrational humans, or our godlike but still, strictly speaking, irrational FAI try to do, the concept of “preference” still needs to be there.
Actually, why is this true? I don’t know about perfect rationality, but why shouldn’t an agent exist whose preferences are completely specified and unchanging?
Again, it’s not about changing preference. See thesecomments.
An agent can have a completely specified and unchanging preference, but still not know everything about it (and never able to know everything about it). In particular, this is a consequence of halting problem: if you have source code of a program, this code completely specifies whether this program halts, and you may run this code for arbitrarily long time without ever changing it, but still not know whether it halts, and not being able to ever figure that out, unless you are lucky to arrive at a solution in this particular case.
OK, I understand now what you’re saying. I think the main difference, then, between preferences in humans and in perfect (theoretical) agents is that our preferences aren’t separate from the rest of our mind.
I think the main difference, then, between preferences in humans and in perfect (theoretical) agents is that our preferences aren’t separate from the rest of our mind.
Rational (designed) agents can have an architecture with preferences (decision making parts) separate from other pieces of their minds (memory, calculations, planning, etc.) Then it’s easy (well, easier) to reason about changing their preferences because we can hold the other parts constant. We can ask things like “given what this agent knows, how would it behave under preference system X”?
The agent may also be able to simulate proposed modifications to its preferences without having to simulate its entire mind (which would be expensive). And, indeed, a sufficiently simple preference system may be chosen so that it is not subject to the halting problem and can be reasoned about.
In humans though, preferences and every other part of our minds influence one another. While I’m holding a philosophical discussion about morality and deciding how to update my so-called preferences, my decisions happen to be affected by hunger or tiredness or remembering having had good sex last night. There are lots of biases that are not perceived directly. We can’t make rational decisions easily.
In rational agents who are self-modifying preferences, the new prefs are determined by the old prefs, i.e. via second-order prefs. But in humans prefs are potentially determined by the entire state of mind, so perhaps we should talk about “modifying our minds” and not our prefs, since it’s hard to completely exclude most of our mind from the process.
Then it’s easy (well, easier) to reason about changing their preferences because we can hold the other parts constant.
As per Pei Wang’s suggestion, I’m stating that I’m going to opt out of this conversation until you take seriously (accept/investigate/argue against) the statement that preference is not to be modified, something that I stressed in several of the last comments.
There are other relevant differences as well, of course. For instance, a good rational agent would be able to literally rewrite its preferences, while humans have trouble with self-binding their future selves.
Wikipedia says moral realists (in general) claim that moral propositions can be true or false as objective facts but their truth cannot be observed or verified. This doesn’t make any sense. Sounds like religion.
Others are critical of moral realism because it postulates the existence of a kind of “moral fact” which is nonmaterial and does not appear to be accessible to the scientific method. Moral truths cannot be observed in the same way as material facts (which are objective), so it seems odd to count them in the same category. One emotivist counterargument (although emotivism is usually non-cognitivist) alleges that “wrong” actions produce measurable results in the form of negative emotional reactions, either within the individual transgressor, within the person or people most directly affected by the act, or within a (preferably wide) consensus of direct or indirect observers.
Regarding the emotivist criticism, it begs a lot of questions. Surely not all negative emotional reactions signal wrong moral actions. Besides, emotivism isn’t aligned with moral realism.
That some criticisms of moral realism appear to lack coherence does not seem to me to be a point that counts against the idea.
I expect moral realists would deny that morality is any more nonmaterial than any other kind of information—and would also deny that it does not appear to be accessible to the scientific method.
If moral realism acts as a system of logical propositions and deductions, then it has to have moral axioms. How are these grounded in material reality? How can they be anything more than “because i said so and I hope you’ll agree”? Isn’t the choice of axioms done using a moral theory nominally opposed to moral realism, such as emotivism, or (amoral) utilitarianism?
One way would be to consider the future of civilization. At the moment, we observe a Shifting Moral Zeitgeist. However, in the future we may see ideas about how to behave towards other agents settle down into an optimal region. If that turns out to be a global optimum—rather than a local one—i.e. much the same rules would be found by most surviving aliens—then that would represent a good foundation for the ideas of moral realism.
Even today, it should be pretty obvious that some moral systems are “better” than others (“better” in the sense of promoting the survival of those systems). That doesn’t necessarily mean there’s a “best” one—but it leaves that possibility open.
It might also sound like science—don’t scientists generally claim that propositions about the world can be true or false, but cannot be directly observed or verified?
In science, a proposition about the world can generally be proven or disproven with arbitrary probability, so you can become as sure about it as you like if you invest enough resources.
In moral realism, propositions are purely logical constructs, and can be proven true or false just like a mathematica proposition. Their truth is one with the truth of the axioms used, and the axioms can’t be proven or disproven with any degree of certainty; they are simply accepted or not accepted. The morality is internally consistent, but you can’t derive it from the real world, and you can’t derive any fact about the real world from the morality. That sounds just like theology to me. (The difference between this and ordinary math or logic, is that mathematical constructs aren’t supposed to lead to should or ought statements about behavior.)
I will read Greene’s thesis, but as far as I can tell it argues against moral realism (and does it well), so it won’t help me understand why anyone would believe in it.
Can’t you choose your (arational) preferences to get any behaviour (decision theory) no matter what interpretation you choose?
Preferences may be arational, but they’re not completely arbitrary. In moral philosophy there are still arguments for what one’s preferences should be, even if they are generally much weaker than the arguments in rationality. Different interpretations influence what kinds of arguments apply or make sense to you, and therefore influence your preferences.
How can there be arguments about what preferences should be? Aren’t they, well, a sort of unmoved mover, a primal cause? (To use some erstwhile philosophical terms :-)
I can understand meta-arguments that say your preferences should be consistent in some sense, or that argue about subgoal preferences given some supergoals. But even under strict constraints of that kind, you have a lot of latitude, from humans to paperclip maximizers on out. Within that range, does interpreting probabilities differently really give you extra power you can’t get by finetuning your prefs?
Edit: the reason I’d perfer editing prefs is that talking about the Meaning of Probabilities sets off my materialism sensors. It leads to things like multiple-world theories because they’re easy to think about as an inetrpretation of QM, regardless of whether they actually exist. Then they can actually negatively affect our prefs or behavior.
Well, I don’t know what many of my preferences should be. How can I find out except by looking for and listening to arguments?
No, not for humans anyway.
That implies there’s some objectively-definable standard for preferences which you’ll be able to recognize once you see it. Also, it begs the question of what in your current preferences says “I have to go out and get some more/different preferences!” From a goal-driven intelligence’s POV, asking others to modify your prefs in unspecified ways is pretty much the anti-rational act.
I think we need to distinguish between what a rational agent should do, and what a non-rational human should do to become more rational. Nesov’s reply to you also concerns the former, I think, but I’m more interested in the latter here.
Unlike a rational agent, we don’t have well-defined preferences, and the preferences that we think we have can be changed by arguments. What to do about this situation? Should we stop thinking up or listening to arguments, and just fill in the fuzzy parts of our preferences with randomness or indifference, in order to emulate a rational agent in the most direct manner possible? That doesn’t make much sense to me.
I’m not sure what we should do exactly, but whatever it is, it seems like arguments must make up a large part of it.
That arguments modify preference means that you are (denotationally) arriving at different preferences depending on arguments. This means that, from the perspective of a specific given preference (or “true” neutral preference not biased by specific arguments), you fail to obtain optimal rational decision algorithm, and thus to achieve high-preference strategy. But at the same time, “absence of action” is also an action, so not exploring the arguments may as well be a worse choice, since you won’t be moving forward towards more clear understanding of your own preference, even if the preference that you are going to understand will be somewhat biased compared to the unknown original one.
Thus, there is a tradeoff:
Irrational perception of arguments leads to modification of preference, which is bad for original preference, but
Considering moral arguments leads to a more clear understanding of some preference close to the original one, which allows to make more rational decisions, which is good for the original preference.
Please see my reply to Nesov above, too.
I think we shouldn’t try to emulate rational agents at all, in the sense that we shouldn’t pretend to have rationality-style preferences and supergoals; as a matter of fact we don’t have them.
Up to here we seem to agree, we just use different terminology. I just don’t want to conflate rational preferences with human preferences because they the two systems behave very differently.
Just as an example, in signalling theories of behaviour, you may consciously believe that your preferences are very different from what your behaviour is actually optimizing for when noone is looking. A rational agent wouldn’t normally have separate conscious/unconscious minds unless only the conscious part was sbuject to outside inspection. In this example, it makes sense to update signalling-preferences sometimes, because they’re not your actual acting-preferences.
But if you consciously intend to act out your (conscious) preferences, and also intend to keep changing them in not-always-foreseeable ways, then that isn’t rationality, and when there could be confusion due to context (such as on LW most of the time) I’d prefer not to use the term “preferences” about humans, or to make clear what is meant.
FWIW, my preferences have not been changed by arguments in the last 20 years. So I don’t think your “we” includes me.
As an example, consider the arguments in form of proofs/disproofs of the statements that you are interested in. Information doesn’t necessarily “change” or “determine arbitrarily” the things you take from it, it may help you to compute an object in which you are already interested, without changing that object, and at the same time be essential in moving forward. If you have an algorithm, it doesn’t mean that you know what this algorithm will give you in the end, what the algorithm “means”. Resist the illusion of transparency.
I don’t understand what you’re saying as applied to this argument. That Wei Dai has an algorithm for modifying his preferences and he doesn’t know what the end output of that algorithm will be?
There will always be something about preference that you don’t know, and it’s not the question of modifying preference, it’s a question of figuring out what the fixed unmodifiable preference implies. Modifying preference is exactly the wrong way of going about this.
If we figure out the conceptual issues of FAI, we’d basically have the algorithm that is our preferences, but not in infinite and unknowable normal “execution trace” denotational “form”.
As Wei says below, we should consider rational agents (who have explicit preferences separate from the rest of their cognitive architecture) separately from humans who want to approximate that in some ways.
I think that if we first define separate preferences, and then proceed to modify them over and over again, this is so different from rational agents that we shouldn’t call it preferences at all. We can talk about e.g. morals instead, or about habits, or biases.
On the other hand if we define human preferences as ‘whatever human behavior happens to optimize’, then there’s nothing interesting about changing our preferences, this is something that happens all the time whether we want it to or not. Under this definition Wei’s statement that he deliberately makes it happen is unclear (the totality of a human’s behaviour, knowledge, etc. is subtly changing over time in any case) so I assumed he was using the former definition.
There is no clear-cut dichotomy between defining something completely at the beginning and doing things arbitrarily as we go. Instead of defining preference for rational agents, in a complete, finished form, and then seeing what happens, consider a process of figuring out what preference is. This is neither a way to arrive at the final answer, at any point, nor a history of observing of “whatever happens”. Rational agent is an impossible construct, but something irrational agents aspire to be, never obtaining. What they want to become isn’t directly related to what they “appear” to strive towards.
I understand. So you’re saying we should indeed use the term ‘preference’ for humans (and a lot of other agents) because no really rational agents can exist.
Actually, why is this true? I don’t know about perfect rationality, but why shouldn’t an agent exist whose preferences are completely specified and unchanging?
Right. Except that really rational agents might exist, but not if their preferences are powerful enough, as humans’ have every chance to be. And whatever we irrational humans, or our godlike but still, strictly speaking, irrational FAI try to do, the concept of “preference” still needs to be there.
Again, it’s not about changing preference. See these comments.
An agent can have a completely specified and unchanging preference, but still not know everything about it (and never able to know everything about it). In particular, this is a consequence of halting problem: if you have source code of a program, this code completely specifies whether this program halts, and you may run this code for arbitrarily long time without ever changing it, but still not know whether it halts, and not being able to ever figure that out, unless you are lucky to arrive at a solution in this particular case.
OK, I understand now what you’re saying. I think the main difference, then, between preferences in humans and in perfect (theoretical) agents is that our preferences aren’t separate from the rest of our mind.
I don’t understand this point.
Rational (designed) agents can have an architecture with preferences (decision making parts) separate from other pieces of their minds (memory, calculations, planning, etc.) Then it’s easy (well, easier) to reason about changing their preferences because we can hold the other parts constant. We can ask things like “given what this agent knows, how would it behave under preference system X”?
The agent may also be able to simulate proposed modifications to its preferences without having to simulate its entire mind (which would be expensive). And, indeed, a sufficiently simple preference system may be chosen so that it is not subject to the halting problem and can be reasoned about.
In humans though, preferences and every other part of our minds influence one another. While I’m holding a philosophical discussion about morality and deciding how to update my so-called preferences, my decisions happen to be affected by hunger or tiredness or remembering having had good sex last night. There are lots of biases that are not perceived directly. We can’t make rational decisions easily.
In rational agents who are self-modifying preferences, the new prefs are determined by the old prefs, i.e. via second-order prefs. But in humans prefs are potentially determined by the entire state of mind, so perhaps we should talk about “modifying our minds” and not our prefs, since it’s hard to completely exclude most of our mind from the process.
As per Pei Wang’s suggestion, I’m stating that I’m going to opt out of this conversation until you take seriously (accept/investigate/argue against) the statement that preference is not to be modified, something that I stressed in several of the last comments.
There are other relevant differences as well, of course. For instance, a good rational agent would be able to literally rewrite its preferences, while humans have trouble with self-binding their future selves.
Re: “How can there be arguments about what preferences should be?”
The idea that some preferences are “better” than other ones is known as “moral realism”.
Wikipedia says moral realists (in general) claim that moral propositions can be true or false as objective facts but their truth cannot be observed or verified. This doesn’t make any sense. Sounds like religion.
Are you looking at http://en.wikipedia.org/wiki/Moral_realism …?
Care to quote an offending section about moral truths not being observervable or verifiable?
Under the section “Criticisms”:
Regarding the emotivist criticism, it begs a lot of questions. Surely not all negative emotional reactions signal wrong moral actions. Besides, emotivism isn’t aligned with moral realism.
I see—thanks.
That some criticisms of moral realism appear to lack coherence does not seem to me to be a point that counts against the idea.
I expect moral realists would deny that morality is any more nonmaterial than any other kind of information—and would also deny that it does not appear to be accessible to the scientific method.
If moral realism acts as a system of logical propositions and deductions, then it has to have moral axioms. How are these grounded in material reality? How can they be anything more than “because i said so and I hope you’ll agree”? Isn’t the choice of axioms done using a moral theory nominally opposed to moral realism, such as emotivism, or (amoral) utilitarianism?
One way would be to consider the future of civilization. At the moment, we observe a Shifting Moral Zeitgeist. However, in the future we may see ideas about how to behave towards other agents settle down into an optimal region. If that turns out to be a global optimum—rather than a local one—i.e. much the same rules would be found by most surviving aliens—then that would represent a good foundation for the ideas of moral realism.
Even today, it should be pretty obvious that some moral systems are “better” than others (“better” in the sense of promoting the survival of those systems). That doesn’t necessarily mean there’s a “best” one—but it leaves that possibility open.
It might also sound like science—don’t scientists generally claim that propositions about the world can be true or false, but cannot be directly observed or verified?
Joshua Greene’s thesis “The Terrible, Horrible, No Good, Very Bad Truth about Morality and What to Do About it” might be a decent introduction to moral realism / irrealism. Overall it is an argument for irrealism.
In science, a proposition about the world can generally be proven or disproven with arbitrary probability, so you can become as sure about it as you like if you invest enough resources.
In moral realism, propositions are purely logical constructs, and can be proven true or false just like a mathematica proposition. Their truth is one with the truth of the axioms used, and the axioms can’t be proven or disproven with any degree of certainty; they are simply accepted or not accepted. The morality is internally consistent, but you can’t derive it from the real world, and you can’t derive any fact about the real world from the morality. That sounds just like theology to me. (The difference between this and ordinary math or logic, is that mathematical constructs aren’t supposed to lead to should or ought statements about behavior.)
I will read Greene’s thesis, but as far as I can tell it argues against moral realism (and does it well), so it won’t help me understand why anyone would believe in it.