Wait, really? I thought it made sense (although I’d contend that most people don’t think about AIXI in terms of those TMs reinforcing hypotheses, which is the point I’m making). What’s incorrect about it?
Well now I’m less sure that it’s incorrect. I was originally imagining that like in Solomonoff induction, the TMs basically directly controlled AIXI’s actions, but that’s not right: there’s an expectimax. And if the TMs reinforce actions by shaping the rewards, in the AIXI formalism you learn that immediately and throw out those TMs.
Actually I think this is total nonsense produced by me forgetting the difference between AIXI and Solomonoff induction.
Wait, really? I thought it made sense (although I’d contend that most people don’t think about AIXI in terms of those TMs reinforcing hypotheses, which is the point I’m making). What’s incorrect about it?
Well now I’m less sure that it’s incorrect. I was originally imagining that like in Solomonoff induction, the TMs basically directly controlled AIXI’s actions, but that’s not right: there’s an expectimax. And if the TMs reinforce actions by shaping the rewards, in the AIXI formalism you learn that immediately and throw out those TMs.
Oh, actually, you’re right (that you were wrong). I think I made the same mistake in my previous comment. Good catch.