I suspect that AIXI is misleading to think about in large part because it lacks reusable parameters—instead it just memorises all inputs it’s seen so far. Which means the setup doesn’t have episodes, or a training/deployment distinction; nor is any behaviour actually “reinforced”.
I kind of think the lack of episodes makes it more realistic for many problems, but admittedly not for simulated games. Also, presumably many of the component Turing machines have reusable parameters and reinforce behaviour, altho this is hidden by the formalism. [EDIT: I retract the second sentence]
Wait, really? I thought it made sense (although I’d contend that most people don’t think about AIXI in terms of those TMs reinforcing hypotheses, which is the point I’m making). What’s incorrect about it?
Well now I’m less sure that it’s incorrect. I was originally imagining that like in Solomonoff induction, the TMs basically directly controlled AIXI’s actions, but that’s not right: there’s an expectimax. And if the TMs reinforce actions by shaping the rewards, in the AIXI formalism you learn that immediately and throw out those TMs.
Yes we do: training is our evolutionary history, deployment is an individual lifetime. And our genomes are our reusable parameters.
Unfortunately I haven’t yet written any papers/posts really laying out this analogy, but it’s pretty central to the way I think about AI, and I’m working on a bunch of related stuff as part of my PhD, so hopefully I’ll have a more complete explanation soon.
I suspect that AIXI is misleading to think about in large part because it lacks reusable parameters—instead it just memorises all inputs it’s seen so far. Which means the setup doesn’t have episodes, or a training/deployment distinction; nor is any behaviour actually “reinforced”.
I kind of think the lack of episodes makes it more realistic for many problems, but admittedly not for simulated games. Also, presumably many of the component Turing machines have reusable parameters and reinforce behaviour, altho this is hidden by the formalism. [EDIT: I retract the second sentence]
Actually I think this is total nonsense produced by me forgetting the difference between AIXI and Solomonoff induction.
Wait, really? I thought it made sense (although I’d contend that most people don’t think about AIXI in terms of those TMs reinforcing hypotheses, which is the point I’m making). What’s incorrect about it?
Well now I’m less sure that it’s incorrect. I was originally imagining that like in Solomonoff induction, the TMs basically directly controlled AIXI’s actions, but that’s not right: there’s an expectimax. And if the TMs reinforce actions by shaping the rewards, in the AIXI formalism you learn that immediately and throw out those TMs.
Oh, actually, you’re right (that you were wrong). I think I made the same mistake in my previous comment. Good catch.
Humans don’t have a training / deployment distinction either… Do humans have “reusable parameters”? Not quite sure what you mean by that.
Yes we do: training is our evolutionary history, deployment is an individual lifetime. And our genomes are our reusable parameters.
Unfortunately I haven’t yet written any papers/posts really laying out this analogy, but it’s pretty central to the way I think about AI, and I’m working on a bunch of related stuff as part of my PhD, so hopefully I’ll have a more complete explanation soon.
Oh, OK, I see what you mean. Possibly related: my comment here.