The post argues a lot against completeness. I have a hard time imagining an advanced AGI (which has the ability to self-reflect a lot) that has a lot of preferences, but no complete preferences.
Your argument seems to be something like “There can be outcomes A and B where neither A⪯B nor B⪯A. This property can be preserved if we sweeten A a little bit: then we have A≺A+ but neither A+⪯B nor B⪯A. If faced with a decision between A and B (or faced with a choice between A+⪯B), the AGI can do something arbitrary, eg just flip a coin.”
I expect advanced AGI systems capable of self-reflection to think whether A or B seems to be more valuable (unless it thinks the situation is so low-stakes that it is not worth thinking about. But computation is cheap, and in AI safety we typically care about high-stakes situation anyways). To use your example: If A is a lottery that gives the agent a Fabergé egg for sure. B is a lottery that returns to the agent their long-lost wedding album, then I would expect an advanced agent to invest a bit into figuring out which of those it deems more valuable.
Also, somewhere in the weights/code of the AGI there has to be some decision procedure, that specifies what the AGI should do if faced with the choice between A and B. It would be possible to hardcode that the AGI should flip a coin when faced with a certain choice.
But by default, I expect the choice between A and B to depend on some learned heuristics (+reflection) and not hardcoded. A plausible candidate here would be a Mesaoptimizer, who might have a preference between A and B even when the outer training rules don’t encourage a preference between A and B.
A-priori, the following outputs of an advanced AGI seem unlikely and unnatural to me:
If faced with a choice between A and B, the AGI chooses each with p=0.5
If faced with a choice between A+ and B, the AGI chooses each with p=0.5
If faced with a choice between A+ and A, the AGI chooses A+ with p=1.
To the extent that humans are general intelligences and have incomplete preferences (for ex. preferential gaps), it seems apparently possible and imaginable to have a generally-intelligent agent with incomplete preferences.
The post argues a lot against completeness. I have a hard time imagining an advanced AGI (which has the ability to self-reflect a lot) that has a lot of preferences, but no complete preferences.
A couple of relevant quotes:
Of all the axioms of utility theory, the completeness axiom is perhaps the most questionable.[8] Like others of the axioms, it is inaccurate as a description of real life; but unlike them, we find it hard to accept even from the normative viewpoint.
Before stating more carefully our goal and the contribution thereof, let us note that there are several economic reasons why one would like to study incomplete preference relations. First of all, as advanced by several authors in the literature, it is not evident if completeness is a fundamental rationality tenet the way the transitivity property is. Aumann (1962), Bewley (1986) and Mandler (1999), among others, defend this position very strongly from both the normative and positive viewpoints. Indeed, if one takes the psychological preference approach (which derives choices from preferences), and not the revealed preference approach, it seems natural to define a preference relation as a potentially incomplete preorder, thereby allowing for the occasional “indecisiveness” of the agents. Secondly, there are economic instances in which a decision maker is in fact composed of several agents each with a possibly distinct objective function. For instance, in coalitional bargaining games, it is in the nature of things to specify the preferences of each coalition by means of a vector of utility functions (one for each member of the coalition), and this requires one to view the preference relation of each coalition as an incomplete preference relation. The same reasoning applies to social choice problems; after all, the most commonly used social welfare ordering in economics, the Pareto dominance, is an incomplete preorder. Finally, we note that incomplete preferences allow one to enrich the decision making process of the agents by providing room for introducing to the model important behavioral traits like status quo bias, loss aversion, procedural decision making, etc.
I read it, but I’m not at all sure it answers the question. It makes three points:
“if one takes the psychological preference approach (which derives choices from preferences), and not the revealed preference approach, it seems natural to define a preference relation as a potentially incomplete preorder, thereby allowing for the occasional “indecisiveness” of the agents”
I don’t see how an agent being indecisive is relevant to preference ordering. Not picking A or B is itself a choice—namely, the agent chooses not to pick either option.
2. “Secondly, there are economic instances in which a decision maker is in fact composed of several agents each with a possibly distinct objective function. For instance, in coalitional bargaining games, it is in the nature of things to specify the preferences of each coalition by means of a vector of utility functions (one for each member of the coalition), and this requires one to view the preference relation of each coalition as an incomplete preference relation.”
So, if the AI is made of multiple agents, each with its own utility function and we use a vector utility function to describe the AI… the AI still makes a particular choice between A and B (or it refuses to choose, which itself is a choice). Isn’t this a flaw of the vector-utility-function description, rather than a real property of the AI?
3. “The same reasoning applies to social choice problems; after all, the most commonly used social welfare ordering in economics, the Pareto dominance”
The post argues a lot against completeness. I have a hard time imagining an advanced AGI (which has the ability to self-reflect a lot) that has a lot of preferences, but no complete preferences.
Your argument seems to be something like “There can be outcomes A and B where neither A⪯B nor B⪯A. This property can be preserved if we sweeten A a little bit: then we have A≺A+ but neither A+⪯B nor B⪯A. If faced with a decision between A and B (or faced with a choice between A+⪯B), the AGI can do something arbitrary, eg just flip a coin.”
I expect advanced AGI systems capable of self-reflection to think whether A or B seems to be more valuable (unless it thinks the situation is so low-stakes that it is not worth thinking about. But computation is cheap, and in AI safety we typically care about high-stakes situation anyways). To use your example: If A is a lottery that gives the agent a Fabergé egg for sure. B is a lottery that returns to the agent their long-lost wedding album, then I would expect an advanced agent to invest a bit into figuring out which of those it deems more valuable.
Also, somewhere in the weights/code of the AGI there has to be some decision procedure, that specifies what the AGI should do if faced with the choice between A and B. It would be possible to hardcode that the AGI should flip a coin when faced with a certain choice. But by default, I expect the choice between A and B to depend on some learned heuristics (+reflection) and not hardcoded. A plausible candidate here would be a Mesaoptimizer, who might have a preference between A and B even when the outer training rules don’t encourage a preference between A and B.
A-priori, the following outputs of an advanced AGI seem unlikely and unnatural to me:
If faced with a choice between A and B, the AGI chooses each with p=0.5
If faced with a choice between A+ and B, the AGI chooses each with p=0.5
If faced with a choice between A+ and A, the AGI chooses A+ with p=1.
To the extent that humans are general intelligences and have incomplete preferences (for ex. preferential gaps), it seems apparently possible and imaginable to have a generally-intelligent agent with incomplete preferences.
A couple of relevant quotes:
(Aumann 1962)
(Dubra et. al. 2001)
Indeed. What would it even mean for an agent not to prefer A over B, and also not to prefer B over A, and also not be indifferent between A and B?
See my comments on this post for links to several answers to this question.
I read it, but I’m not at all sure it answers the question. It makes three points:
“if one takes the psychological preference approach (which derives choices from preferences), and not the revealed preference approach, it seems natural to define a preference relation as a potentially incomplete preorder, thereby allowing for the occasional “indecisiveness” of the agents”
I don’t see how an agent being indecisive is relevant to preference ordering. Not picking A or B is itself a choice—namely, the agent chooses not to pick either option.
2. “Secondly, there are economic instances in which a decision maker is in fact composed of several agents each with a possibly distinct objective function. For instance, in coalitional bargaining games, it is in the nature of things to specify the preferences of each coalition by means of a vector of utility functions (one for each member of the coalition), and this requires one to view the preference relation of each coalition as an incomplete preference relation.”
So, if the AI is made of multiple agents, each with its own utility function and we use a vector utility function to describe the AI… the AI still makes a particular choice between A and B (or it refuses to choose, which itself is a choice). Isn’t this a flaw of the vector-utility-function description, rather than a real property of the AI?
3. “The same reasoning applies to social choice problems; after all, the most commonly used social welfare ordering in economics, the Pareto dominance”
I’m not sure how this is related to AI.
Do you have any ideas?