For the longest while I have been trying to figure what AIXI is about. Tell me if I got it correct:
We are in an unknown world that has a utility function to maximise. For instance, we are in a pacman game and we are trying to gain the maximum score possible.
Based on the previous observation and rewards, AIXI forms different model for predicting which action will maximise future rewards. It chooses the model with the greats rewards with a small program size.
For the longest while I have been trying to figure what AIXI is about. Tell me if I got it correct:
We are in an unknown world that has a utility function to maximise. For instance, we are in a pacman game and we are trying to gain the maximum score possible.
Based on the previous observation and rewards, AIXI forms different model for predicting which action will maximise future rewards. It chooses the model with the greats rewards with a small program size.