I enjoyed reading this! And I hadn’t seen the interpretation of a logistic preference model as approximating Gaussian errors before.
Since you seem interested in exploring this more, some comments that might be helpful (or not):
What is the largest number of elements we can sort with a given architecture? How does training time change as a function of the number of elements?
How does the network architecture affect the resulting utility function? How do the maximum and minimum of the unnormalized utility function change?
I’m confused why you’re using a neural network; given the small size of the input space, wouldn’t it be easier to just learn a tabular utility function (i.e. one value for each input, namely its utility)? It’s the largest function space you can have but will presumably also be much easier to train than a NN.
Questions like the ones you raise could become more interesting in settings with much more complicated inputs. But I think in practice, the expensive part of preference/reward learning is gathering the preferences, and the most likely failure modes revolve around things related to training an RL policy in parallel to the reward model. The architecture etc. seem a bit less crucial in comparison.
Which portion of possible comparisons needs to be presented (on average) to infer the utility function?
I thought about this and very similar questions a bit for my Master’s thesis before changing topics, happy to chat about that if you want to go down this route. (Though I didn’t think about inconsistent preferences, just about the effect of noise. Without either, the answer should just be NlogN I guess.)
How far can we degenerate a preference ordering until no consistent utility function can be inferred anymore?
You might want to think more about how to measure this, or even what exactly it would mean if “no consistent utility function can be inferred”. In principle, for any (not necessarily transitive) set of preferences, we can ask what utility function best approximates these preferences (e.g. in the sense of minimizing loss). The approximation can be exact iff the preferences are consistent. Intuitively, slightly inconsistent preferences lead to a reasonably good approximation, and very inconsistent preferences probably admit only very bad approximations. But there doesn’t seem to be any point where we can’t infer the best possible approximation at all.
Related to this (but a bit more vague/speculative): it’s not obvious to me that approximating inconsistent preferences using a utility function is the “right” thing to do. At least in cases where human preferences are highly inconsistent, this seems kind of scary. Not sure what we want instead (maybe the AI should point out inconsistencies and ask us to please resolve them?).
Awesome, thanks for the feedback Eric! And glad to hear you enjoyed the post!
I’m confused why you’re using a neural network
Good point, for the example post it was total overkill. The reason I went with a NN was to demonstrate the link with the usual setting in which preference learning is applied. And in general, NNs generalize better than the table-based approach ( see also my response to Charlie Steiner ).
happy to chat about that
I definitely plan to write a follow-up to this post, will come back to your offer when that follow-up reaches the front of my queue :)
But there doesn’t seem to be any point where we can’t infer the best possible approximation at all.
Hadn’t thought about this before! Perhaps it could work to compare the inferred utility function with a random baseline? I.e. the baseline policy would be “for every comparison, flip a coin and make that your prediction about the human preference”.
If this happens to accurately describe how the human makes the decision, then the utility function should not be able to perform better than the baseline (and perhaps even worse). How much more structure can we add to the human choice before the utility function performs better than the random baseline?
it’s not obvious to me that approximating inconsistent preferences using a utility function is the “right” thing to do
True! I guess one proposal to resolve these inconsistencies is CEV, although that is not very computable.
I enjoyed reading this! And I hadn’t seen the interpretation of a logistic preference model as approximating Gaussian errors before.
Since you seem interested in exploring this more, some comments that might be helpful (or not):
I’m confused why you’re using a neural network; given the small size of the input space, wouldn’t it be easier to just learn a tabular utility function (i.e. one value for each input, namely its utility)? It’s the largest function space you can have but will presumably also be much easier to train than a NN.
Questions like the ones you raise could become more interesting in settings with much more complicated inputs. But I think in practice, the expensive part of preference/reward learning is gathering the preferences, and the most likely failure modes revolve around things related to training an RL policy in parallel to the reward model. The architecture etc. seem a bit less crucial in comparison.
I thought about this and very similar questions a bit for my Master’s thesis before changing topics, happy to chat about that if you want to go down this route. (Though I didn’t think about inconsistent preferences, just about the effect of noise. Without either, the answer should just be NlogN I guess.)
You might want to think more about how to measure this, or even what exactly it would mean if “no consistent utility function can be inferred”. In principle, for any (not necessarily transitive) set of preferences, we can ask what utility function best approximates these preferences (e.g. in the sense of minimizing loss). The approximation can be exact iff the preferences are consistent. Intuitively, slightly inconsistent preferences lead to a reasonably good approximation, and very inconsistent preferences probably admit only very bad approximations. But there doesn’t seem to be any point where we can’t infer the best possible approximation at all.
Related to this (but a bit more vague/speculative): it’s not obvious to me that approximating inconsistent preferences using a utility function is the “right” thing to do. At least in cases where human preferences are highly inconsistent, this seems kind of scary. Not sure what we want instead (maybe the AI should point out inconsistencies and ask us to please resolve them?).
Awesome, thanks for the feedback Eric! And glad to hear you enjoyed the post!
Good point, for the example post it was total overkill. The reason I went with a NN was to demonstrate the link with the usual setting in which preference learning is applied. And in general, NNs generalize better than the table-based approach ( see also my response to Charlie Steiner ).
I definitely plan to write a follow-up to this post, will come back to your offer when that follow-up reaches the front of my queue :)
Hadn’t thought about this before! Perhaps it could work to compare the inferred utility function with a random baseline? I.e. the baseline policy would be “for every comparison, flip a coin and make that your prediction about the human preference”.
If this happens to accurately describe how the human makes the decision, then the utility function should not be able to perform better than the baseline (and perhaps even worse). How much more structure can we add to the human choice before the utility function performs better than the random baseline?
True! I guess one proposal to resolve these inconsistencies is CEV, although that is not very computable.