Capybasilisk comments on Deriving Our World From Small Datasets

Capybasilisk 19 Mar 2022 18:11 UTC
1 point

That’s the problem with Kolmogorov complexity: it is the shortest program given unlimited compute. And it spends any amount of compute for a shorter program

I don’t see why it’s assumed that we’d necessarily be searching for the most concise models rather than, say, optimizing for CPU cycles or memory consumption. I’m thinking of something like Charles Bennett’s Logical Depth.

These types of approaches also take it for granted that we’re conducting an exhaustive search of model-space, which yes, is ludicrous. Of course we’d burn through our limited compute trying to brute-force the space. There’s plenty of room for improvement in a stochastic search of models which, while still expensive, at least has us in the realm of the physically possible. There might be something to be said for working primarily on the problem of probabilistic search in large, discrete spaces before we even turn to the problem of trying to model reality.

(Standard Model equations + initial Big Bang conditions); that’s radical data efficiency,

Allow me to indulge in a bit of goal-post shifting.

A dataset like that gives us the entire Universe, ie. Earth and a vast amount of stuff we probably don’t care about. There might come a point where I care about the social habits of a particular species in the Whirlpool Galaxy, but right now I’m much more concerned about the human world. I’m far more interested in datasets that primarily give us our world, and through which the fundamental workings of the Universe can be surmised. That’s why I nominated the VIX as a simple, human/Earth-centric dataset that perhaps holds a great amount of extractible information.
- gwern 19 Mar 2022 18:25 UTC
  3 points
  Parent
  
  rather than, say, optimizing for CPU cycles or memory consumption
  
  As I already pointed out, we already do. And turns out that you need to optimize more for CPU/memory, past the kilobytes of samples which are already flabby and unnecessary from the point of view of KC. And more. And more. Go right past ‘megabyte’ without even stopping. Still way too small, way too compute/memory-hungry. And a whole bunch more beyond that. And then you hit the Hutter Prize size, and that’s still too optimized for sample-efficiency, and we need to keep going. Yes, blow through ‘gigabyte’, and then more, more, and some more—and eventually, a few orders of magnitude sample-inefficiency later, you begin to hit projects like GPT-3 which are finally getting somewhere, having traded off enough sample-inefficiency (hundreds of gigabytes) to bring the compute requirements down into the merely mortal realm.
  
  A dataset like that gives us the entire Universe, ie. Earth and a vast amount of stuff we probably don’t care about.
  
  You can locate the Earth in relatively few bits of information. Off the top of my head: the observable universe is only 45 billion lightyears radius; how many bits could an index into that possibly take? 24 bits to encode distance from origin in lightyears out of 45b, maybe another 24 bits to encode angle? <50 bits for such a crude encoding, giving an upper bound. You need to locate the Earth in time as well? Another <20 bits or so to pin down which year out of ~4.5b years. If you can do KC at all, another <60 bits or so shouldn’t be a big deal...