So if you give an agent a bad prior, it can make bad decisions. This is not a new insight.
Low probability hypotheses predicting vast rewards/punishments, seems equivalent to Pascal’s Mugging. Any agent that maximizes expected utility will spend increasing amounts of resources, worrying about more and more unlikely hypotheses. In the limit, it will spend all of it’s time and energy caring about a single random hypotheses which predicts infinite reward (like your examples), even if it has zero probability.
I’ve argued in the past that maximizing expected utility should be abandoned. I may not have the perfect alternative, and alternatives may be somewhat ad hoc. But that’s better than just ignoring the problem.
AIXI is still optimal at doing what you told it to do. It’s maximizing it’s expected reward, given the prior you tell it. It’s just what you told it to do isn’t what you really want. But we already knew that.
Oh, one interesting thing is that your example does appear similar to real life. If you die, you get stuck in a state where you don’t receive any more rewards. I think this is actually a desirable thing and solves the anvil problem. I’ve suggested this solution in the past.
So if you give an agent a bad prior, it can make bad decisions. This is not a new insight.
Low probability hypotheses predicting vast rewards/punishments, seems equivalent to Pascal’s Mugging. Any agent that maximizes expected utility will spend increasing amounts of resources, worrying about more and more unlikely hypotheses. In the limit, it will spend all of it’s time and energy caring about a single random hypotheses which predicts infinite reward (like your examples), even if it has zero probability.
I’ve argued in the past that maximizing expected utility should be abandoned. I may not have the perfect alternative, and alternatives may be somewhat ad hoc. But that’s better than just ignoring the problem.
AIXI is still optimal at doing what you told it to do. It’s maximizing it’s expected reward, given the prior you tell it. It’s just what you told it to do isn’t what you really want. But we already knew that.
Oh, one interesting thing is that your example does appear similar to real life. If you die, you get stuck in a state where you don’t receive any more rewards. I think this is actually a desirable thing and solves the anvil problem. I’ve suggested this solution in the past.
No, maximizing expected utility (still) should not be abandoned.