Vivek Hebbar comments on The “Minimal Latents” Approach to Natural Abstractions

Vivek Hebbar 21 Dec 2022 22:57 UTC
LW: 8 AF: 6
5
AF
In ML terms, nearly-all the informational work of learning what “apple” means must be performed by unsupervised learning, not supervised learning. Otherwise the number of examples required would be far too large to match toddlers’ actual performance.
I’d guess the vast majority of the work (relative to the max-entropy baseline) is done by the inductive bias.
- Rohin Shah 22 Dec 2022 7:52 UTC
  LW: 8 AF: 6
  7
  AF Parent
  You don’t need to guess; it’s clearly true. Even a 1 trillion parameter network where each parameter is represented with 64 bits can still only represent at most $2^{64, 000, 000, 000, 000}$ different functions, which is a tiny tiny fraction of the full space of $2^{2^{8, 000, 000}}$ possible functions. You’re already getting at least $2^{8, 000, 000} - 64, 000, 000, 000, 000$ of the bits just by choosing the network architecture.
  (This does assume things like “the neural network can learn the correct function rather than a nearly-correct function” but similarly the argument in the OP assumes “the toddler does learn the correct function rather than a nearly-correct function”.)
  - LawrenceC 22 Dec 2022 8:06 UTC
    LW: 4 AF: 3
    2
    AF Parent
    See also Superexponential Concept Space, and Simple Words, from the Sequences:
    By the time you’re talking about data with forty binary attributes, the number of possible examples is past a trillion—but the number of possible concepts is past two-to-the-trillionth-power. To narrow down that superexponential concept space, you’d have to see over a trillion examples before you could say what was In, and what was Out. You’d have to see every possible example, in fact.
    [...]
    From this perspective, learning doesn’t just rely on inductive bias, it is nearly all inductive bias—when you compare the number of concepts ruled out a priori, to those ruled out by mere evidence.