Well, I don’t know what many of my preferences should be. How can I find out except by looking for and listening to arguments?
That implies there’s some objectively-definable standard for preferences which you’ll be able to recognize once you see it. Also, it begs the question of what in your current preferences says “I have to go out and get some more/different preferences!” From a goal-driven intelligence’s POV, asking others to modify your prefs in unspecified ways is pretty much the anti-rational act.
I think we need to distinguish between what a rational agent should do, and what a non-rational human should do to become more rational. Nesov’s reply to you also concerns the former, I think, but I’m more interested in the latter here.
Unlike a rational agent, we don’t have well-defined preferences, and the preferences that we think we have can be changed by arguments. What to do about this situation? Should we stop thinking up or listening to arguments, and just fill in the fuzzy parts of our preferences with randomness or indifference, in order to emulate a rational agent in the most direct manner possible? That doesn’t make much sense to me.
I’m not sure what we should do exactly, but whatever it is, it seems like arguments must make up a large part of it.
That arguments modify preference means that you are (denotationally) arriving at different preferences depending on arguments. This means that, from the perspective of a specific given preference (or “true” neutral preference not biased by specific arguments), you fail to obtain optimal rational decision algorithm, and thus to achieve high-preference strategy. But at the same time, “absence of action” is also an action, so not exploring the arguments may as well be a worse choice, since you won’t be moving forward towards more clear understanding of your own preference, even if the preference that you are going to understand will be somewhat biased compared to the unknown original one.
Thus, there is a tradeoff:
Irrational perception of arguments leads to modification of preference, which is bad for original preference, but
Considering moral arguments leads to a more clear understanding of some preference close to the original one, which allows to make more rational decisions, which is good for the original preference.
I think we shouldn’t try to emulate rational agents at all, in the sense that we shouldn’t pretend to have rationality-style preferences and supergoals; as a matter of fact we don’t have them.
Up to here we seem to agree, we just use different terminology. I just don’t want to conflate rational preferences with human preferences because they the two systems behave very differently.
Just as an example, in signalling theories of behaviour, you may consciously believe that your preferences are very different from what your behaviour is actually optimizing for when noone is looking. A rational agent wouldn’t normally have separate conscious/unconscious minds unless only the conscious part was sbuject to outside inspection. In this example, it makes sense to update signalling-preferences sometimes, because they’re not your actual acting-preferences.
But if you consciously intend to act out your (conscious) preferences, and also intend to keep changing them in not-always-foreseeable ways, then that isn’t rationality, and when there could be confusion due to context (such as on LW most of the time) I’d prefer not to use the term “preferences” about humans, or to make clear what is meant.
As an example, consider the arguments in form of proofs/disproofs of the statements that you are interested in. Information doesn’t necessarily “change” or “determine arbitrarily” the things you take from it, it may help you to compute an object in which you are already interested, without changing that object, and at the same time be essential in moving forward. If you have an algorithm, it doesn’t mean that you know what this algorithm will give you in the end, what the algorithm “means”. Resist the illusion of transparency.
I don’t understand what you’re saying as applied to this argument. That Wei Dai has an algorithm for modifying his preferences and he doesn’t know what the end output of that algorithm will be?
There will always be something about preference that you don’t know, and it’s not the question of modifying preference, it’s a question of figuring out what the fixed unmodifiable preference implies. Modifying preference is exactly the wrong way of going about this.
If we figure out the conceptual issues of FAI, we’d basically have the algorithm that is our preferences, but not in infinite and unknowable normal “execution trace” denotational “form”.
As Wei says below, we should consider rational agents (who have explicit preferences separate from the rest of their cognitive architecture) separately from humans who want to approximate that in some ways.
I think that if we first define separate preferences, and then proceed to modify them over and over again, this is so different from rational agents that we shouldn’t call it preferences at all. We can talk about e.g. morals instead, or about habits, or biases.
On the other hand if we define human preferences as ‘whatever human behavior happens to optimize’, then there’s nothing interesting about changing our preferences, this is something that happens all the time whether we want it to or not. Under this definition Wei’s statement that he deliberately makes it happen is unclear (the totality of a human’s behaviour, knowledge, etc. is subtly changing over time in any case) so I assumed he was using the former definition.
There is no clear-cut dichotomy between defining something completely at the beginning and doing things arbitrarily as we go. Instead of defining preference for rational agents, in a complete, finished form, and then seeing what happens, consider a process of figuring out what preference is. This is neither a way to arrive at the final answer, at any point, nor a history of observing of “whatever happens”. Rational agent is an impossible construct, but something irrational agents aspire to be, never obtaining. What they want to become isn’t directly related to what they “appear” to strive towards.
I understand. So you’re saying we should indeed use the term ‘preference’ for humans (and a lot of other agents) because no really rational agents can exist.
Actually, why is this true? I don’t know about perfect rationality, but why shouldn’t an agent exist whose preferences are completely specified and unchanging?
I understand. So you’re saying we should indeed use the term ‘preference’ for humans (and a lot of other agents) because no really rational agents can exist.
Right. Except that really rational agents might exist, but not if their preferences are powerful enough, as humans’ have every chance to be. And whatever we irrational humans, or our godlike but still, strictly speaking, irrational FAI try to do, the concept of “preference” still needs to be there.
Actually, why is this true? I don’t know about perfect rationality, but why shouldn’t an agent exist whose preferences are completely specified and unchanging?
Again, it’s not about changing preference. See thesecomments.
An agent can have a completely specified and unchanging preference, but still not know everything about it (and never able to know everything about it). In particular, this is a consequence of halting problem: if you have source code of a program, this code completely specifies whether this program halts, and you may run this code for arbitrarily long time without ever changing it, but still not know whether it halts, and not being able to ever figure that out, unless you are lucky to arrive at a solution in this particular case.
OK, I understand now what you’re saying. I think the main difference, then, between preferences in humans and in perfect (theoretical) agents is that our preferences aren’t separate from the rest of our mind.
I think the main difference, then, between preferences in humans and in perfect (theoretical) agents is that our preferences aren’t separate from the rest of our mind.
Rational (designed) agents can have an architecture with preferences (decision making parts) separate from other pieces of their minds (memory, calculations, planning, etc.) Then it’s easy (well, easier) to reason about changing their preferences because we can hold the other parts constant. We can ask things like “given what this agent knows, how would it behave under preference system X”?
The agent may also be able to simulate proposed modifications to its preferences without having to simulate its entire mind (which would be expensive). And, indeed, a sufficiently simple preference system may be chosen so that it is not subject to the halting problem and can be reasoned about.
In humans though, preferences and every other part of our minds influence one another. While I’m holding a philosophical discussion about morality and deciding how to update my so-called preferences, my decisions happen to be affected by hunger or tiredness or remembering having had good sex last night. There are lots of biases that are not perceived directly. We can’t make rational decisions easily.
In rational agents who are self-modifying preferences, the new prefs are determined by the old prefs, i.e. via second-order prefs. But in humans prefs are potentially determined by the entire state of mind, so perhaps we should talk about “modifying our minds” and not our prefs, since it’s hard to completely exclude most of our mind from the process.
Then it’s easy (well, easier) to reason about changing their preferences because we can hold the other parts constant.
As per Pei Wang’s suggestion, I’m stating that I’m going to opt out of this conversation until you take seriously (accept/investigate/argue against) the statement that preference is not to be modified, something that I stressed in several of the last comments.
There are other relevant differences as well, of course. For instance, a good rational agent would be able to literally rewrite its preferences, while humans have trouble with self-binding their future selves.
That implies there’s some objectively-definable standard for preferences which you’ll be able to recognize once you see it. Also, it begs the question of what in your current preferences says “I have to go out and get some more/different preferences!” From a goal-driven intelligence’s POV, asking others to modify your prefs in unspecified ways is pretty much the anti-rational act.
I think we need to distinguish between what a rational agent should do, and what a non-rational human should do to become more rational. Nesov’s reply to you also concerns the former, I think, but I’m more interested in the latter here.
Unlike a rational agent, we don’t have well-defined preferences, and the preferences that we think we have can be changed by arguments. What to do about this situation? Should we stop thinking up or listening to arguments, and just fill in the fuzzy parts of our preferences with randomness or indifference, in order to emulate a rational agent in the most direct manner possible? That doesn’t make much sense to me.
I’m not sure what we should do exactly, but whatever it is, it seems like arguments must make up a large part of it.
That arguments modify preference means that you are (denotationally) arriving at different preferences depending on arguments. This means that, from the perspective of a specific given preference (or “true” neutral preference not biased by specific arguments), you fail to obtain optimal rational decision algorithm, and thus to achieve high-preference strategy. But at the same time, “absence of action” is also an action, so not exploring the arguments may as well be a worse choice, since you won’t be moving forward towards more clear understanding of your own preference, even if the preference that you are going to understand will be somewhat biased compared to the unknown original one.
Thus, there is a tradeoff:
Irrational perception of arguments leads to modification of preference, which is bad for original preference, but
Considering moral arguments leads to a more clear understanding of some preference close to the original one, which allows to make more rational decisions, which is good for the original preference.
Please see my reply to Nesov above, too.
I think we shouldn’t try to emulate rational agents at all, in the sense that we shouldn’t pretend to have rationality-style preferences and supergoals; as a matter of fact we don’t have them.
Up to here we seem to agree, we just use different terminology. I just don’t want to conflate rational preferences with human preferences because they the two systems behave very differently.
Just as an example, in signalling theories of behaviour, you may consciously believe that your preferences are very different from what your behaviour is actually optimizing for when noone is looking. A rational agent wouldn’t normally have separate conscious/unconscious minds unless only the conscious part was sbuject to outside inspection. In this example, it makes sense to update signalling-preferences sometimes, because they’re not your actual acting-preferences.
But if you consciously intend to act out your (conscious) preferences, and also intend to keep changing them in not-always-foreseeable ways, then that isn’t rationality, and when there could be confusion due to context (such as on LW most of the time) I’d prefer not to use the term “preferences” about humans, or to make clear what is meant.
FWIW, my preferences have not been changed by arguments in the last 20 years. So I don’t think your “we” includes me.
As an example, consider the arguments in form of proofs/disproofs of the statements that you are interested in. Information doesn’t necessarily “change” or “determine arbitrarily” the things you take from it, it may help you to compute an object in which you are already interested, without changing that object, and at the same time be essential in moving forward. If you have an algorithm, it doesn’t mean that you know what this algorithm will give you in the end, what the algorithm “means”. Resist the illusion of transparency.
I don’t understand what you’re saying as applied to this argument. That Wei Dai has an algorithm for modifying his preferences and he doesn’t know what the end output of that algorithm will be?
There will always be something about preference that you don’t know, and it’s not the question of modifying preference, it’s a question of figuring out what the fixed unmodifiable preference implies. Modifying preference is exactly the wrong way of going about this.
If we figure out the conceptual issues of FAI, we’d basically have the algorithm that is our preferences, but not in infinite and unknowable normal “execution trace” denotational “form”.
As Wei says below, we should consider rational agents (who have explicit preferences separate from the rest of their cognitive architecture) separately from humans who want to approximate that in some ways.
I think that if we first define separate preferences, and then proceed to modify them over and over again, this is so different from rational agents that we shouldn’t call it preferences at all. We can talk about e.g. morals instead, or about habits, or biases.
On the other hand if we define human preferences as ‘whatever human behavior happens to optimize’, then there’s nothing interesting about changing our preferences, this is something that happens all the time whether we want it to or not. Under this definition Wei’s statement that he deliberately makes it happen is unclear (the totality of a human’s behaviour, knowledge, etc. is subtly changing over time in any case) so I assumed he was using the former definition.
There is no clear-cut dichotomy between defining something completely at the beginning and doing things arbitrarily as we go. Instead of defining preference for rational agents, in a complete, finished form, and then seeing what happens, consider a process of figuring out what preference is. This is neither a way to arrive at the final answer, at any point, nor a history of observing of “whatever happens”. Rational agent is an impossible construct, but something irrational agents aspire to be, never obtaining. What they want to become isn’t directly related to what they “appear” to strive towards.
I understand. So you’re saying we should indeed use the term ‘preference’ for humans (and a lot of other agents) because no really rational agents can exist.
Actually, why is this true? I don’t know about perfect rationality, but why shouldn’t an agent exist whose preferences are completely specified and unchanging?
Right. Except that really rational agents might exist, but not if their preferences are powerful enough, as humans’ have every chance to be. And whatever we irrational humans, or our godlike but still, strictly speaking, irrational FAI try to do, the concept of “preference” still needs to be there.
Again, it’s not about changing preference. See these comments.
An agent can have a completely specified and unchanging preference, but still not know everything about it (and never able to know everything about it). In particular, this is a consequence of halting problem: if you have source code of a program, this code completely specifies whether this program halts, and you may run this code for arbitrarily long time without ever changing it, but still not know whether it halts, and not being able to ever figure that out, unless you are lucky to arrive at a solution in this particular case.
OK, I understand now what you’re saying. I think the main difference, then, between preferences in humans and in perfect (theoretical) agents is that our preferences aren’t separate from the rest of our mind.
I don’t understand this point.
Rational (designed) agents can have an architecture with preferences (decision making parts) separate from other pieces of their minds (memory, calculations, planning, etc.) Then it’s easy (well, easier) to reason about changing their preferences because we can hold the other parts constant. We can ask things like “given what this agent knows, how would it behave under preference system X”?
The agent may also be able to simulate proposed modifications to its preferences without having to simulate its entire mind (which would be expensive). And, indeed, a sufficiently simple preference system may be chosen so that it is not subject to the halting problem and can be reasoned about.
In humans though, preferences and every other part of our minds influence one another. While I’m holding a philosophical discussion about morality and deciding how to update my so-called preferences, my decisions happen to be affected by hunger or tiredness or remembering having had good sex last night. There are lots of biases that are not perceived directly. We can’t make rational decisions easily.
In rational agents who are self-modifying preferences, the new prefs are determined by the old prefs, i.e. via second-order prefs. But in humans prefs are potentially determined by the entire state of mind, so perhaps we should talk about “modifying our minds” and not our prefs, since it’s hard to completely exclude most of our mind from the process.
As per Pei Wang’s suggestion, I’m stating that I’m going to opt out of this conversation until you take seriously (accept/investigate/argue against) the statement that preference is not to be modified, something that I stressed in several of the last comments.
There are other relevant differences as well, of course. For instance, a good rational agent would be able to literally rewrite its preferences, while humans have trouble with self-binding their future selves.