There is no clear-cut dichotomy between defining something completely at the beginning and doing things arbitrarily as we go. Instead of defining preference for rational agents, in a complete, finished form, and then seeing what happens, consider a process of figuring out what preference is. This is neither a way to arrive at the final answer, at any point, nor a history of observing of “whatever happens”. Rational agent is an impossible construct, but something irrational agents aspire to be, never obtaining. What they want to become isn’t directly related to what they “appear” to strive towards.
I understand. So you’re saying we should indeed use the term ‘preference’ for humans (and a lot of other agents) because no really rational agents can exist.
Actually, why is this true? I don’t know about perfect rationality, but why shouldn’t an agent exist whose preferences are completely specified and unchanging?
I understand. So you’re saying we should indeed use the term ‘preference’ for humans (and a lot of other agents) because no really rational agents can exist.
Right. Except that really rational agents might exist, but not if their preferences are powerful enough, as humans’ have every chance to be. And whatever we irrational humans, or our godlike but still, strictly speaking, irrational FAI try to do, the concept of “preference” still needs to be there.
Actually, why is this true? I don’t know about perfect rationality, but why shouldn’t an agent exist whose preferences are completely specified and unchanging?
Again, it’s not about changing preference. See thesecomments.
An agent can have a completely specified and unchanging preference, but still not know everything about it (and never able to know everything about it). In particular, this is a consequence of halting problem: if you have source code of a program, this code completely specifies whether this program halts, and you may run this code for arbitrarily long time without ever changing it, but still not know whether it halts, and not being able to ever figure that out, unless you are lucky to arrive at a solution in this particular case.
OK, I understand now what you’re saying. I think the main difference, then, between preferences in humans and in perfect (theoretical) agents is that our preferences aren’t separate from the rest of our mind.
I think the main difference, then, between preferences in humans and in perfect (theoretical) agents is that our preferences aren’t separate from the rest of our mind.
Rational (designed) agents can have an architecture with preferences (decision making parts) separate from other pieces of their minds (memory, calculations, planning, etc.) Then it’s easy (well, easier) to reason about changing their preferences because we can hold the other parts constant. We can ask things like “given what this agent knows, how would it behave under preference system X”?
The agent may also be able to simulate proposed modifications to its preferences without having to simulate its entire mind (which would be expensive). And, indeed, a sufficiently simple preference system may be chosen so that it is not subject to the halting problem and can be reasoned about.
In humans though, preferences and every other part of our minds influence one another. While I’m holding a philosophical discussion about morality and deciding how to update my so-called preferences, my decisions happen to be affected by hunger or tiredness or remembering having had good sex last night. There are lots of biases that are not perceived directly. We can’t make rational decisions easily.
In rational agents who are self-modifying preferences, the new prefs are determined by the old prefs, i.e. via second-order prefs. But in humans prefs are potentially determined by the entire state of mind, so perhaps we should talk about “modifying our minds” and not our prefs, since it’s hard to completely exclude most of our mind from the process.
Then it’s easy (well, easier) to reason about changing their preferences because we can hold the other parts constant.
As per Pei Wang’s suggestion, I’m stating that I’m going to opt out of this conversation until you take seriously (accept/investigate/argue against) the statement that preference is not to be modified, something that I stressed in several of the last comments.
There are other relevant differences as well, of course. For instance, a good rational agent would be able to literally rewrite its preferences, while humans have trouble with self-binding their future selves.
There is no clear-cut dichotomy between defining something completely at the beginning and doing things arbitrarily as we go. Instead of defining preference for rational agents, in a complete, finished form, and then seeing what happens, consider a process of figuring out what preference is. This is neither a way to arrive at the final answer, at any point, nor a history of observing of “whatever happens”. Rational agent is an impossible construct, but something irrational agents aspire to be, never obtaining. What they want to become isn’t directly related to what they “appear” to strive towards.
I understand. So you’re saying we should indeed use the term ‘preference’ for humans (and a lot of other agents) because no really rational agents can exist.
Actually, why is this true? I don’t know about perfect rationality, but why shouldn’t an agent exist whose preferences are completely specified and unchanging?
Right. Except that really rational agents might exist, but not if their preferences are powerful enough, as humans’ have every chance to be. And whatever we irrational humans, or our godlike but still, strictly speaking, irrational FAI try to do, the concept of “preference” still needs to be there.
Again, it’s not about changing preference. See these comments.
An agent can have a completely specified and unchanging preference, but still not know everything about it (and never able to know everything about it). In particular, this is a consequence of halting problem: if you have source code of a program, this code completely specifies whether this program halts, and you may run this code for arbitrarily long time without ever changing it, but still not know whether it halts, and not being able to ever figure that out, unless you are lucky to arrive at a solution in this particular case.
OK, I understand now what you’re saying. I think the main difference, then, between preferences in humans and in perfect (theoretical) agents is that our preferences aren’t separate from the rest of our mind.
I don’t understand this point.
Rational (designed) agents can have an architecture with preferences (decision making parts) separate from other pieces of their minds (memory, calculations, planning, etc.) Then it’s easy (well, easier) to reason about changing their preferences because we can hold the other parts constant. We can ask things like “given what this agent knows, how would it behave under preference system X”?
The agent may also be able to simulate proposed modifications to its preferences without having to simulate its entire mind (which would be expensive). And, indeed, a sufficiently simple preference system may be chosen so that it is not subject to the halting problem and can be reasoned about.
In humans though, preferences and every other part of our minds influence one another. While I’m holding a philosophical discussion about morality and deciding how to update my so-called preferences, my decisions happen to be affected by hunger or tiredness or remembering having had good sex last night. There are lots of biases that are not perceived directly. We can’t make rational decisions easily.
In rational agents who are self-modifying preferences, the new prefs are determined by the old prefs, i.e. via second-order prefs. But in humans prefs are potentially determined by the entire state of mind, so perhaps we should talk about “modifying our minds” and not our prefs, since it’s hard to completely exclude most of our mind from the process.
As per Pei Wang’s suggestion, I’m stating that I’m going to opt out of this conversation until you take seriously (accept/investigate/argue against) the statement that preference is not to be modified, something that I stressed in several of the last comments.
There are other relevant differences as well, of course. For instance, a good rational agent would be able to literally rewrite its preferences, while humans have trouble with self-binding their future selves.