gwern comments on Frequentist Magic vs. Bayesian Magic

gwern 12 Apr 2010 18:44 UTC
0 points

If we could somehow represent the knowledge we have (both explicit and implicit) in a format that integrates nicely with the way the AIXI-approximating program stores its own knowledge, then we could “bring it up to speed” with where we are, and let it learn from there.

Too many restrictions there, I think. The format doesn’t have to be nice—any format it doesn’t know it will know after a fixed-length penalty. We could just dump in the Internet Archive raw.

The requirement that it perform optimally on average (probability weighted) over all computable environments consistent with its history means that it has to pass some of its performance into those other environments, which can still be drastically different.

But those other environments where it performs poorly are the ones that ought to perform poorly: its best performance is reserved for the most likely future histories, just like we would aspire to.

We would perform poorly in an anti-Occamian universe just like it would, but we’re far from optimal and so would perform worse in other scenarios, I would think. I suppose we could be so biased and incorrect that we luck out and our biases and errors are just right, but is it plausible that we could luck out enough to overcome the general performance difference?
- SilasBarta 12 Apr 2010 20:21 UTC
  0 points
  Parent
  
  Too many restrictions there, I think. The format doesn’t have to be nice—any format it doesn’t know it will know after a fixed-length penalty. We could just dump in the Internet Archive raw.
  
  Alright, the thing I meant by “nice” needs some elaboration. I’ll put it this way: For an Ultimate AI, all of its knowledge—with no exceptions—is in terms of constraints on what it expects to observe. (And yes, this is what rationalists should strive for too.) So there is no, “light is waves”, no “that’s the Doppler effect”. There are only mappings from inputs to probability distributions on future inputs. (Confusingly, this would also mean an expectation for [phenomena explainable as] humans [generating sound waves explainable by them] saying [something we would recognize as the statement] “light is waves”. Phew!)
  
  Any large human-generated knowledge base initially appears to AIXI as a long string of characters and/or some input/output black box. What in its inputspace do the characters refer to? What is the appropriate way to group them? Most importantly, after being told that “this stuff is true” or “this is a lot of what goes on in the environment that computes your inputs”, how does it know how it maps to the rest of the environment’s generating function? (Which I guess is ultimately the same as the first question.)
  
  That problem is nearly as intractable as starting from just an Occamian prior. It’s only resolved by symbol-grounding, which means representing knowledge in the form of a probability distribution on observations, in a way your program can understand. Which I think brings you back into the AI-complete realm.
  
  But those other environments where it performs poorly are the ones that ought to perform poorly: its best performance is reserved for the most likely future histories, just like we would aspire to.
  
  Okay, you’re right, I wasn’t keeping the comparison baselines straight. If you could give the program enough knowledge, in a form it understands or could quickly learn, to distinguish this computable environment from substantively different environment (including its location within it, the relevant history, etc.), then yes, it would make better inferences than humans.
  
  But the point stands that dumping any kind of knowledge base on a computable version of AIXI won’t help you a bit until you’ve done a lot more of the cognitive labor.