In theory, changing the exploration rate and changing the prior are equivalent. I think that it might be easier to decide upon an exploration rate that gives a good result for generic priors, than to be sure that generic priors have good exploration rates. But this is just an impression.
In theory, changing the exploration rate and changing the prior are equivalent.
Not really. Standard AIXI is completely deterministic, while the usual exploration strategies for reinforcement learning, such as ɛ-greedy and soft-max, are stochastic.
By changing the prior, you can make an AIXI agent explore more if it receives one set of inputs and also explore less if it receives another set of inputs. You can’t do this by changing an “exploration rate”, unless you’re using some technical definition where it’s not a scalar number?
Given arbitrary computing power and full knowledge of the actual environment, these are equivalent. But, as you point out, in practice they’re going to be different. For us, something simple like a “exploration rate” is probably more understandable for what the AIXI’s actions will look like.
In theory, changing the exploration rate and changing the prior are equivalent. I think that it might be easier to decide upon an exploration rate that gives a good result for generic priors, than to be sure that generic priors have good exploration rates. But this is just an impression.
Not really. Standard AIXI is completely deterministic, while the usual exploration strategies for reinforcement learning, such as ɛ-greedy and soft-max, are stochastic.
By changing the prior, you can make an AIXI agent explore more if it receives one set of inputs and also explore less if it receives another set of inputs. You can’t do this by changing an “exploration rate”, unless you’re using some technical definition where it’s not a scalar number?
Given arbitrary computing power and full knowledge of the actual environment, these are equivalent. But, as you point out, in practice they’re going to be different. For us, something simple like a “exploration rate” is probably more understandable for what the AIXI’s actions will look like.