Rohin Shah comments on The “Minimal Latents” Approach to Natural Abstractions

Rohin Shah 22 Dec 2022 7:52 UTC
LW: 8 AF: 6
7
AF
You don’t need to guess; it’s clearly true. Even a 1 trillion parameter network where each parameter is represented with 64 bits can still only represent at most $2^{64, 000, 000, 000, 000}$ different functions, which is a tiny tiny fraction of the full space of $2^{2^{8, 000, 000}}$ possible functions. You’re already getting at least $2^{8, 000, 000} - 64, 000, 000, 000, 000$ of the bits just by choosing the network architecture.
(This does assume things like “the neural network can learn the correct function rather than a nearly-correct function” but similarly the argument in the OP assumes “the toddler does learn the correct function rather than a nearly-correct function”.)
- LawrenceC 22 Dec 2022 8:06 UTC
  LW: 4 AF: 3
  2
  AF Parent
  See also Superexponential Concept Space, and Simple Words, from the Sequences:
  By the time you’re talking about data with forty binary attributes, the number of possible examples is past a trillion—but the number of possible concepts is past two-to-the-trillionth-power. To narrow down that superexponential concept space, you’d have to see over a trillion examples before you could say what was In, and what was Out. You’d have to see every possible example, in fact.
  [...]
  From this perspective, learning doesn’t just rely on inductive bias, it is nearly all inductive bias—when you compare the number of concepts ruled out a priori, to those ruled out by mere evidence.