it will quickly learn that it’s not in Hell since it won’t actually receive ε reward for outputting “0”.
The example was meant to show that if it was in Heaven, it will behave as if it was in Hell (now that’s a theological point there ^_^ ). Your example is more general.
The result of the paper is that as long as the AIXI gets a minimum non-zero average reward (essentially), you can make it follow that policy forever.
The example was meant to show that if it was in Heaven, it will behave as if it was in Hell (now that’s a theological point there ^_^ ). Your example is more general.
The result of the paper is that as long as the AIXI gets a minimum non-zero average reward (essentially), you can make it follow that policy forever.