Are you saying that my description (following) is incorrect?
[incomplete preferences w/ caprice] would be equivalent to 1. choosing the best policy by ranking them in the partial order of outcomes (randomizing over multiple maxima), then 2. implementing that policy without further consideration.
Or are you saying that it is correct, but you disagree that this implies that it is “behaviorally indistinguishable from an agent with complete preferences”? If this is the case, then I think we might disagree on the definition of “behaviorally indistinguishable”? I’m using it like: If you observe a single sequence of actions from this agent (and knowing the agent’s world model), can you construct a utility function over outcomes that could have produced that sequence.
Or consider another example. The agent trades A for B, then B for A, then declines to trade A for B+. That’s compatible with the Caprice rule, but not with complete preferences.
This is compatible with a resolute outcome-utility maximizer (for whom A is a maxima). There’s no rule that says an agent must take the shortest route to the same outcome (right?).
As Gustafsson notes, if an agent uses resolute choice to avoid the money pump for cyclic preferences, that agent has to choose against their strict preferences at some point. ... There’s no such drawback for agents with incomplete preferences using resolute choice.
Sure, but why is that a drawback? It can’t be money pumped, right? Agents following resolute choice often choose against their local strict preferences in other decision problems. (E.g. Newcomb’s). And this is considered an argument in favour of resolute choice.
Are you saying that my description (following) is incorrect?
Or are you saying that it is correct, but you disagree that this implies that it is “behaviorally indistinguishable from an agent with complete preferences”? If this is the case, then I think we might disagree on the definition of “behaviorally indistinguishable”? I’m using it like: If you observe a single sequence of actions from this agent (and knowing the agent’s world model), can you construct a utility function over outcomes that could have produced that sequence.
This is compatible with a resolute outcome-utility maximizer (for whom A is a maxima). There’s no rule that says an agent must take the shortest route to the same outcome (right?).
Sure, but why is that a drawback? It can’t be money pumped, right? Agents following resolute choice often choose against their local strict preferences in other decision problems. (E.g. Newcomb’s). And this is considered an argument in favour of resolute choice.