While I agree with you, I also acknowledge that having changing weights of a multidimensional model is an inconsistency that violates VNM utility axioms, and it means that the agent can be money-pumped (making repeated locally-preferable decisions that each lose some long-term value for the agent).
Any actual decision is a selection of the top choice in a single dimension (“what I choose”). If that partial-ranking is inconsistent, the agent is not rational.
The resolution, of course is to recognize that humans are not rational. https://en.wikipedia.org/wiki/Dynamic_inconsistency gives some pointers to how well we know that’s true. I don’t have any references, and would enjoy seeing some papers or writeups on what it even means for a rational agent to be “aligned” with irrational ones.
While I agree with you, I also acknowledge that having changing weights of a multidimensional model is an inconsistency that violates VNM utility axioms, and it means that the agent can be money-pumped (making repeated locally-preferable decisions that each lose some long-term value for the agent).
Any actual decision is a selection of the top choice in a single dimension (“what I choose”). If that partial-ranking is inconsistent, the agent is not rational.
The resolution, of course is to recognize that humans are not rational. https://en.wikipedia.org/wiki/Dynamic_inconsistency gives some pointers to how well we know that’s true. I don’t have any references, and would enjoy seeing some papers or writeups on what it even means for a rational agent to be “aligned” with irrational ones.