epistemic status: butterfly idea, I really should learn more topology and vector calculus, potentially very fruitful line of research in theoretical ethics
Consider an agent with inconsistent preferences. How can we represent their inconsistent preferences so that they can be turned into consistent variants of themselves, while minimally changing their structure?
One way to approach this is to represent the preferences as a directed graph. In this case, we have for example an intransitive preference
We can make this preference consistent by deleting an edge, but unfortunately we don’t have a unique solution to making this preference consistent. I am interested in the combinatorics of turning a directed graph into a path graph (or its transitive closure) while making the minimal number of modificiations (deleting edges, adding edges, merging edges) to the initial graph.
If we look at preferences over lotteries, we could represent a subset of all possible inconsistent preferences as a vector field over the probability simplex of all underlying options. Circular preferences would then be represented by the curl of the vector field, violations of continuity would be points in the vector field where the vectors are “0”: for example on the line simplex, it would be the vector field “flipping around” at some point on the line (I don’t know vector calculus
that well).
It might then be possible to define a function that minimally modifies that vector field to convert it to a consistent preference relation (i.e. a vector field that “flows toward a single option”). Defining “minimally” is the trick here: I’m imagining something akin to taking the integral over the rotation & size of the vectors of the difference of the vector field before and after (I told you I don’t really know
the math of that yet).
I suspect that there are interesting further extensions to be discovered in extremal graph theory, for example graphons which look like they might be able to represent more & different preferential inconsistencies, but this is more of a hunch.
If solved, this could be relevant to ambitious value learning and potential resolving ontological crises.
Here are some questions I have:
I bet there is research on how to turn inconsistent preferences into consistent ones, but a cursory search hasn’t shown me anything satisfactory yet. Where is it?
I would be fairly sure that there is also research on how to turn arbitrary directed graphs into directed path graphs, but I don’t know about it yet.
What kinds of preferential inconsistencies can be represented via directed graphs/vector fields on probability simplices?
What would the vector field over a probability simplex look like to be equivalent to a consistent preference?
What is the topology of minimally transforming such a “inconsistent” vector field into a “consistent” vector field?
Does any of this actually make sense?
I will continue to look into this here unless someone gives me an argument that definitely kills this.
Represent preferences as a directed graph with edge weights. (A.k.a. a network.)
An edge from A to B with weight W means ‘the agent would trade A for B in exchange for >=W units of utility’.
(I’m still debating as to if this definition of edge weight is backwards or not. There are arguments either way. That being said, it’s just a sign flip.)
An edge does not automatically imply a backedge. (Or alternatively, could have a backedge with a weight of infinity.)
A loop (self-edge) is technically allowed, but is ‘interesting’.
Then:
An agent is exploitable if there is any negative cycle (cycle with sum weight negative).
An agent is open to timewasting if there is any zero cycle.
I don’t know if there is a better term here.
There are several approaches to removing the inconsistancies in an inconsistant set of preferences.
One suboptimal but simple approach:
While the graph has a negative cycle of weight -W and length L
Add W/L to all edges in the cycle.
Another suboptimal but slightly better approach:
While the graph has a negative cycle:
Find the edge with the highest minimum W/L (that is, for all negative cycles, calculate W/L for all edges involved in said cycle, then find the edge with the maximum minimum W/L calculated) and add said W/L to it.
I strongly suspect that the ‘optimal’ solution is NP-hard[1]. That being said, roughly:
Minimize the total amount of weight added such that there are no negative cycles.
Tiebreak by minimum maximum weight added to a single edge.
Some examples of a 3-element cycle:
A =1> B =1> C =1> A
(That is, the agent would trade B for A only if you provide >=one unit of utility.)
No inconsistencies here.
A =1> B =1> C =-2> A
No inconsistencies here either, though the agent is open to timewasting. (‘I’ll trade A B C for A B C. Again. Again. Again. Etc.’)
A =1> B =1> C =-5> A
This is inconsistent. Someone could trade A B C for A B C and 3 units of utility.
If this (or a variant) works out it would be pretty cool: One could look at an inconsistent agent and make the preferences consistent in the minimally modifying way, and then know what the agent would ideally want.
One further idea that could incorporate this is the following: We try to learn human preferences, but might worry that we’re not learning them at the right level of abstraction. But if we have some known preferential inconsistencies for humans (e.g. the Allais paradox), we can look at a proposed learned preference and search for it, if it’s not present, we reject it. If it is present, we can then just apply the make_consistent algorithm to the learned preference.
epistemic status: butterfly idea, I really should learn more topology and vector calculus, potentially very fruitful line of research in theoretical ethics
Consider an agent with inconsistent preferences. How can we represent their inconsistent preferences so that they can be turned into consistent variants of themselves, while minimally changing their structure?
One way to approach this is to represent the preferences as a directed graph. In this case, we have for example an intransitive preference
We can make this preference consistent by deleting an edge, but unfortunately we don’t have a unique solution to making this preference consistent. I am interested in the combinatorics of turning a directed graph into a path graph (or its transitive closure) while making the minimal number of modificiations (deleting edges, adding edges, merging edges) to the initial graph.
If we look at preferences over lotteries, we could represent a subset of all possible inconsistent preferences as a vector field over the probability simplex of all underlying options. Circular preferences would then be represented by the curl of the vector field, violations of continuity would be points in the vector field where the vectors are “0”: for example on the line simplex, it would be the vector field “flipping around” at some point on the line (I don’t know vector calculus that well).
It might then be possible to define a function that minimally modifies that vector field to convert it to a consistent preference relation (i.e. a vector field that “flows toward a single option”). Defining “minimally” is the trick here: I’m imagining something akin to taking the integral over the rotation & size of the vectors of the difference of the vector field before and after (I told you I don’t really know the math of that yet).
I suspect that there are interesting further extensions to be discovered in extremal graph theory, for example graphons which look like they might be able to represent more & different preferential inconsistencies, but this is more of a hunch.
If solved, this could be relevant to ambitious value learning and potential resolving ontological crises.
Here are some questions I have:
I bet there is research on how to turn inconsistent preferences into consistent ones, but a cursory search hasn’t shown me anything satisfactory yet. Where is it?
I would be fairly sure that there is also research on how to turn arbitrary directed graphs into directed path graphs, but I don’t know about it yet.
What kinds of preferential inconsistencies can be represented via directed graphs/vector fields on probability simplices?
What would the vector field over a probability simplex look like to be equivalent to a consistent preference?
What is the topology of minimally transforming such a “inconsistent” vector field into a “consistent” vector field?
Does any of this actually make sense?
I will continue to look into this here unless someone gives me an argument that definitely kills this.
Here’s a somewhat similar approach:
Represent preferences as a directed graph with edge weights. (A.k.a. a network.)
An edge from A to B with weight W means ‘the agent would trade A for B in exchange for >=W units of utility’.
(I’m still debating as to if this definition of edge weight is backwards or not. There are arguments either way. That being said, it’s just a sign flip.)
An edge does not automatically imply a backedge. (Or alternatively, could have a backedge with a weight of infinity.)
A loop (self-edge) is technically allowed, but is ‘interesting’.
Then:
An agent is exploitable if there is any negative cycle (cycle with sum weight negative).
An agent is open to timewasting if there is any zero cycle.
I don’t know if there is a better term here.
There are several approaches to removing the inconsistancies in an inconsistant set of preferences.
One suboptimal but simple approach:
While the graph has a negative cycle of weight -W and length L
Add W/L to all edges in the cycle.
Another suboptimal but slightly better approach:
While the graph has a negative cycle:
Find the edge with the highest minimum W/L (that is, for all negative cycles, calculate W/L for all edges involved in said cycle, then find the edge with the maximum minimum W/L calculated) and add said W/L to it.
I strongly suspect that the ‘optimal’ solution is NP-hard[1]. That being said, roughly:
Minimize the total amount of weight added such that there are no negative cycles.
Tiebreak by minimum maximum weight added to a single edge.
Some examples of a 3-element cycle:
A =1> B =1> C =1> A
(That is, the agent would trade B for A only if you provide >=one unit of utility.)
No inconsistencies here.
A =1> B =1> C =-2> A
No inconsistencies here either, though the agent is open to timewasting. (‘I’ll trade A B C for A B C. Again. Again. Again. Etc.’)
A =1> B =1> C =-5> A
This is inconsistent. Someone could trade A B C for A B C and 3 units of utility.
Consistent result is
A =2> B =2> C =-4> A
A =1> B =1> C =-5> A ; A =1> D =1> C
Consistent results from approaches above:
A =2> B =2> C =-3.33> A ; A =1.67> D =1.67> C
A =1> B =1> C =-2> A ; A =1> D =1> C
A =1> B =1> C =-2> A ; A =1> D =1> C
I strongly suspect there’s a reduction to 3SAT, in particular.
As for the second question, I think I was looking for transitive reduction and graph edit distance.
If this (or a variant) works out it would be pretty cool: One could look at an inconsistent agent and make the preferences consistent in the minimally modifying way, and then know what the agent would ideally want.
One further idea that could incorporate this is the following: We try to learn human preferences, but might worry that we’re not learning them at the right level of abstraction. But if we have some known preferential inconsistencies for humans (e.g. the Allais paradox), we can look at a proposed learned preference and search for it, if it’s not present, we reject it. If it is present, we can then just apply the
make_consistent
algorithm to the learned preference.