Uncertainty over utility functions + a prior that there are systematic mistakes might be enough to handle this, but I agree that this problem seems hard and not yet tackled in the literature. I personally lean towards “expected explicit utility maximizers are the wrong framework to use”.
I don’t know yet, but researchers have some preliminary thoughts, which I’m hoping to write about in the future. Also I realized that what I actually meant to say is “expected explicit utility maximizers are the wrong framework to use”, not utility functions—I’ve edited the parent comment to reflect this. CIRL comes to mind as published work that’s moving in a direction away from “expected explicit utility maximizers”, even though it does involve a reward function—it involves a human-robot system that together are optimizing some expected utility, but the robot itself is not maximizing some explicitly represented utility function.
Uncertainty over utility functions + a prior that there are systematic mistakes might be enough to handle this, but I agree that this problem seems hard and not yet tackled in the literature. I personally lean towards “expected explicit utility maximizers are the wrong framework to use”.
What framework do you use?
I don’t know yet, but researchers have some preliminary thoughts, which I’m hoping to write about in the future. Also I realized that what I actually meant to say is “expected explicit utility maximizers are the wrong framework to use”, not utility functions—I’ve edited the parent comment to reflect this. CIRL comes to mind as published work that’s moving in a direction away from “expected explicit utility maximizers”, even though it does involve a reward function—it involves a human-robot system that together are optimizing some expected utility, but the robot itself is not maximizing some explicitly represented utility function.