Yes! There’s two ways that can be relevant. First, a ton of bits presumably come from unsupervised learning of the general structure of the world. That part also carries over to natural abstractions/minimal latents: the big pile of random variables from which we’re extracting a minimal latent is meant to represent things like all those images the toddler sees over the course of their early life.
Second, sparsity: most of the images/subimages which hit my eyes do not contain apples. Indeed, most images/subimages which hit my eyes do not contain instances of most abstract object types. That fact could either be hard-coded in the toddler’s prior, or learned insofar as it’s already learning all these natural latents in an unsupervised way and can notice the sparsity. So, when a parent says “apple” while there’s an apple in front of the toddler, sparsity dramatically narrows down the space of things they might be referring to.
Wait but they see a ton of images that they aren’t told contain apples, right? Surely that should count. (Probably not 2^big_number bits tho)
Yes! There’s two ways that can be relevant. First, a ton of bits presumably come from unsupervised learning of the general structure of the world. That part also carries over to natural abstractions/minimal latents: the big pile of random variables from which we’re extracting a minimal latent is meant to represent things like all those images the toddler sees over the course of their early life.
Second, sparsity: most of the images/subimages which hit my eyes do not contain apples. Indeed, most images/subimages which hit my eyes do not contain instances of most abstract object types. That fact could either be hard-coded in the toddler’s prior, or learned insofar as it’s already learning all these natural latents in an unsupervised way and can notice the sparsity. So, when a parent says “apple” while there’s an apple in front of the toddler, sparsity dramatically narrows down the space of things they might be referring to.