RE: my last question- After talking to Stuart, I think one way of viewing the problem with such a proposal is: The agent cares about its future expected utility (which depends on the state/​history, not just the MDP).
RE: my last question- After talking to Stuart, I think one way of viewing the problem with such a proposal is: The agent cares about its future expected utility (which depends on the state/​history, not just the MDP).