If you have possible targets T1,…,TN, then as an alternative to maximin, you could also maximize log(T1)+⋯+log(TN), or replace log with your favorite function with sufficiently-sharply-diminishing returns. This has the advantage that the AI will continue to take pareto-improvements rather than often feeling completely neutral about them.
(That’s just a fun idea that I think I got indirectly from Stuart Armstrong … but I’m skeptical that this would really be relevant: I suspect that if you have any uncertainty about the target at all, it would rapidly turn into a combinatorial explosion of possibilities, such that it would be infeasible for the AI to keep track of how an action scores on all gazillion of them.)
(In normal planning problems there are exponentially many plans to evaluate (in the number of actions). So that doesn’t seem to be a major obstacle if your agent is already capable of planning.)
If you have possible targets T1,…,TN, then as an alternative to maximin, you could also maximize log(T1)+⋯+log(TN), or replace log with your favorite function with sufficiently-sharply-diminishing returns. This has the advantage that the AI will continue to take pareto-improvements rather than often feeling completely neutral about them.
(That’s just a fun idea that I think I got indirectly from Stuart Armstrong … but I’m skeptical that this would really be relevant: I suspect that if you have any uncertainty about the target at all, it would rapidly turn into a combinatorial explosion of possibilities, such that it would be infeasible for the AI to keep track of how an action scores on all gazillion of them.)
Geometric rationality ftw!
(In normal planning problems there are exponentially many plans to evaluate (in the number of actions). So that doesn’t seem to be a major obstacle if your agent is already capable of planning.)