I think the main difference, then, between preferences in humans and in perfect (theoretical) agents is that our preferences aren’t separate from the rest of our mind.
Rational (designed) agents can have an architecture with preferences (decision making parts) separate from other pieces of their minds (memory, calculations, planning, etc.) Then it’s easy (well, easier) to reason about changing their preferences because we can hold the other parts constant. We can ask things like “given what this agent knows, how would it behave under preference system X”?
The agent may also be able to simulate proposed modifications to its preferences without having to simulate its entire mind (which would be expensive). And, indeed, a sufficiently simple preference system may be chosen so that it is not subject to the halting problem and can be reasoned about.
In humans though, preferences and every other part of our minds influence one another. While I’m holding a philosophical discussion about morality and deciding how to update my so-called preferences, my decisions happen to be affected by hunger or tiredness or remembering having had good sex last night. There are lots of biases that are not perceived directly. We can’t make rational decisions easily.
In rational agents who are self-modifying preferences, the new prefs are determined by the old prefs, i.e. via second-order prefs. But in humans prefs are potentially determined by the entire state of mind, so perhaps we should talk about “modifying our minds” and not our prefs, since it’s hard to completely exclude most of our mind from the process.
Then it’s easy (well, easier) to reason about changing their preferences because we can hold the other parts constant.
As per Pei Wang’s suggestion, I’m stating that I’m going to opt out of this conversation until you take seriously (accept/investigate/argue against) the statement that preference is not to be modified, something that I stressed in several of the last comments.
There are other relevant differences as well, of course. For instance, a good rational agent would be able to literally rewrite its preferences, while humans have trouble with self-binding their future selves.
I don’t understand this point.
Rational (designed) agents can have an architecture with preferences (decision making parts) separate from other pieces of their minds (memory, calculations, planning, etc.) Then it’s easy (well, easier) to reason about changing their preferences because we can hold the other parts constant. We can ask things like “given what this agent knows, how would it behave under preference system X”?
The agent may also be able to simulate proposed modifications to its preferences without having to simulate its entire mind (which would be expensive). And, indeed, a sufficiently simple preference system may be chosen so that it is not subject to the halting problem and can be reasoned about.
In humans though, preferences and every other part of our minds influence one another. While I’m holding a philosophical discussion about morality and deciding how to update my so-called preferences, my decisions happen to be affected by hunger or tiredness or remembering having had good sex last night. There are lots of biases that are not perceived directly. We can’t make rational decisions easily.
In rational agents who are self-modifying preferences, the new prefs are determined by the old prefs, i.e. via second-order prefs. But in humans prefs are potentially determined by the entire state of mind, so perhaps we should talk about “modifying our minds” and not our prefs, since it’s hard to completely exclude most of our mind from the process.
As per Pei Wang’s suggestion, I’m stating that I’m going to opt out of this conversation until you take seriously (accept/investigate/argue against) the statement that preference is not to be modified, something that I stressed in several of the last comments.
There are other relevant differences as well, of course. For instance, a good rational agent would be able to literally rewrite its preferences, while humans have trouble with self-binding their future selves.