I’m not aware of much other preference learning literature that is relevant to this particular type of value learning
I can’t imagine there isn’t a single paper out there in the literature about supervised learning of VNM-style utility functions over rich, or even weak, hypothesis spaces.
I do think that if you’ve researched this more thoroughly than I have (I’d bet you have, since it’s your job), the paper really ought to include a critique of the existing literature, so as to characterize what sections of the unevaluated-potential-solution tree for the value-learning problem should be explored first.
I can’t imagine there isn’t a single paper out there in the literature about supervised learning of VNM-style utility functions over rich, or even weak, hypothesis spaces.
Here’s a trivial example pulled off one minute’s Googling. It “counts” because the kernel trick is sufficiently rich to include all possible functions over Hilbert spaces.
I do think that if you’ve researched this more thoroughly than I have (I’d bet you have, since it’s your job), the paper really ought to include a critique of the existing literature, so as to characterize what sections of the unevaluated-potential-solution tree for the value-learning problem should be explored first.