It’s really too bad that we couldn’t do better than Pareto optimal.
It seems like that’s to be expected:
) is a polytope, and any Pareto optimal point will be an extreme point of that polytope. For each extreme point of the polytope, there exists some some linear objective function that is maximized over ) at that point. It remains to show that all the weights are non-negative, but that’s taken care of by restricting your attention to the Pareto optimal points, and I suspect that any extreme point that’s maximized by a non-negative utility function is Pareto optimal.
We also gain a lot by considering any convex combination of two policies as its own policy; the meat of the conclusion is “you should be willing to make linear tradeoffs between the decision-theoretic utility components of your aggregated decision-theoretic utility function,” which is much cleaner with policy convexity than without it.
It seems like that’s to be expected:
) is a polytope, and any Pareto optimal point will be an extreme point of that polytope. For each extreme point of the polytope, there exists some some linear objective function that is maximized over ) at that point. It remains to show that all the weights are non-negative, but that’s taken care of by restricting your attention to the Pareto optimal points, and I suspect that any extreme point that’s maximized by a non-negative utility function is Pareto optimal.We also gain a lot by considering any convex combination of two policies as its own policy; the meat of the conclusion is “you should be willing to make linear tradeoffs between the decision-theoretic utility components of your aggregated decision-theoretic utility function,” which is much cleaner with policy convexity than without it.