It’s not so much ignoring observations as testing models that allow for your sense data to be subject to both gaussian noise as well as systematic errors, i.e. explaining part of the observations as sensory fuzziness.
In such a case, an overly simply model that posits e.g. some systematic error in its sensors may have an advantage over an actually correct albeit more complex model, due to the way that the length penalty for the Universal Prior rapidly accumulates.
Imagine AIXI coming to the conclusion that the string it is watching is in fact partly output by a random string generator that intermittently takes over. If the competing (but potentially correct) model that works without such a random string generator needs just a megabit more space to specify, do the math.
I’ll still have to think upon it further. It’s just not something to be dismissed out of hand, and just one of several highly relevant tangents (since it pertains to real world applicability; if its a design byproduct it might well translate to any Monte Carlo or assorted formulations). It might well turn out to be a non-issue.
Does AIXI admit the possibility of random string generators? IIRC it only allows deterministic programs, so if it sees patterns a simple model can’t match, then it’s forced to update the model with “but there are exceptions: bit N is 1, and bit N+1 is 1, and bit N+2 is 0… etc” to account for the error. In other words, the size of the “simple model” then grows to be the size of the deterministic part plus the size of the error correction part. And in that case, even a megabyte of additional complexity in a model would stop effectively ruling out that complex model just as soon as more than a couple megabytes of simple-model-incompatible data had been seen.
IANAE, but doesn’t AIXI work based on prediction instead of explanation? An algorithm that attempts to “explain away” sense data will be unable to predict the next sequence of the AI’s input, and will be discarded.
If your agent operates in an environment such that your sense data contains errors or such that the world that spawns that sense data isn’t deterministic, at least not on a level that your sense data can pick up—both of which cannot be avoided—then perfect predictability is out of the question anyways.
The problem then shifts to “how much error or fuzziness of the sense data or the underlying world is allowed”, at which point there’s a trade-off between “short and enourmously more preferred model that predicts more errors/fuzziness” versus “longer and enourmously less preferred model that predicts less errors/fuzziness”.
This is as far as I know not an often discussed topic, at least not around here, probably because people haven’t yet hooked up any computable version of AIXI with sensors that are relevantly imperfect and that are probing a truly probabilistic environment. Those concerns do not really apply to learning PAC-Man.
Solomonoff induction never ignores observations.
One liners, eh?
It’s not so much ignoring observations as testing models that allow for your sense data to be subject to both gaussian noise as well as systematic errors, i.e. explaining part of the observations as sensory fuzziness.
In such a case, an overly simply model that posits e.g. some systematic error in its sensors may have an advantage over an actually correct albeit more complex model, due to the way that the length penalty for the Universal Prior rapidly accumulates.
Imagine AIXI coming to the conclusion that the string it is watching is in fact partly output by a random string generator that intermittently takes over. If the competing (but potentially correct) model that works without such a random string generator needs just a megabit more space to specify, do the math.
I’ll still have to think upon it further. It’s just not something to be dismissed out of hand, and just one of several highly relevant tangents (since it pertains to real world applicability; if its a design byproduct it might well translate to any Monte Carlo or assorted formulations). It might well turn out to be a non-issue.
Does AIXI admit the possibility of random string generators? IIRC it only allows deterministic programs, so if it sees patterns a simple model can’t match, then it’s forced to update the model with “but there are exceptions: bit N is 1, and bit N+1 is 1, and bit N+2 is 0… etc” to account for the error. In other words, the size of the “simple model” then grows to be the size of the deterministic part plus the size of the error correction part. And in that case, even a megabyte of additional complexity in a model would stop effectively ruling out that complex model just as soon as more than a couple megabytes of simple-model-incompatible data had been seen.
Nesov is right.
IANAE, but doesn’t AIXI work based on prediction instead of explanation? An algorithm that attempts to “explain away” sense data will be unable to predict the next sequence of the AI’s input, and will be discarded.
If your agent operates in an environment such that your sense data contains errors or such that the world that spawns that sense data isn’t deterministic, at least not on a level that your sense data can pick up—both of which cannot be avoided—then perfect predictability is out of the question anyways.
The problem then shifts to “how much error or fuzziness of the sense data or the underlying world is allowed”, at which point there’s a trade-off between “short and enourmously more preferred model that predicts more errors/fuzziness” versus “longer and enourmously less preferred model that predicts less errors/fuzziness”.
This is as far as I know not an often discussed topic, at least not around here, probably because people haven’t yet hooked up any computable version of AIXI with sensors that are relevantly imperfect and that are probing a truly probabilistic environment. Those concerns do not really apply to learning PAC-Man.