I don’t think guided training is generally the right way to disabuse an AIXI agent of misconception we think it might get. What training amounts to is having the agent’s memory begin with some carefully constructed string s0. All this does is change the agent’s prior from some P based on Kolmogorov complexity to the prior P’ (s) = P (s0+s | s0) (Here + is concatenation). If what you’re really doing is changing the agent’s prior to what you want, you should do that with self-awareness and no artificial restriction. In certain circumstances guided training might be the right method, but the general approach should be to think about what prior we want and hard-code it as effectively as possible. Taken to the natural extreme this amounts to making an AI that works on completely different principles than AIXI.
I don’t think guided training is generally the right way to disabuse an AIXI agent of misconception we think it might get. What training amounts to is having the agent’s memory begin with some carefully constructed string s0. All this does is change the agent’s prior from some P based on Kolmogorov complexity to the prior P’ (s) = P (s0+s | s0) (Here + is concatenation). If what you’re really doing is changing the agent’s prior to what you want, you should do that with self-awareness and no artificial restriction. In certain circumstances guided training might be the right method, but the general approach should be to think about what prior we want and hard-code it as effectively as possible. Taken to the natural extreme this amounts to making an AI that works on completely different principles than AIXI.