paulfchristiano comments on Learning the prior

paulfchristiano 6 Jul 2020 3:30 UTC
LW: 4 AF: 2
AF
I’m not totally sure what actually distinguishes f and Z, especially once you start jointly optimizing them. If f incorporates background knowledge about the world, it can do better at prediction tasks. Normally we imagine f having many more parameters than Z, and so being more likely to squirrel away extra facts, but if Z is large then we might imagine it containing computationally interesting artifacts like patterns that are designed to train a trainable f on background knowledge in a way that doesn’t look much like human-written text.
f is just predicting P(y|x, Z), it’s not trying to model D. So you don’t gain anything by putting facts about the data distribution in f—you have to put them in Z so that it changes P(y|x,Z).
Now, maybe you can try to ensure that Z is at least somewhat textlike via making sure it’s not too easy for a discriminator to tell from human text, or requiring it to play some functional role in a pure text generator, or whatever.
The only thing Z does is get handed to the human for computing P(y|x,Z).
- Charlie Steiner 6 Jul 2020 23:11 UTC
  LW: 2 AF: 1
  AF Parent
  Ah, I think I see, thanks for explaining. So even when you talk about amplifying f, you mean a certain way of extending human predictions to more complicated background information (e.g. via breaking down Z into chunks and then using copies of f that have been trained on smaller Z), not fine-tuning f to make better predictions. Or maybe some amount of fine-tuning for “better” predictions by some method of eliciting its own standards, but not by actually comparing it to the ground truth.
  This (along with eventually reading your companion post) also helps resolve the confusion I was having over what exactly was the prior in “learning the prior”—Z is just like a latent space, and f is the decoder from Z to predictions. My impression is that your hope is that if Z and f start out human-like, then this is like specifying the “programming language” of a universal prior, so that search for highly-predictive Z, decoded through f, will give something that uses human concepts in predicting the world.
  Is that somewhat in the right ballpark?
  - paulfchristiano 7 Jul 2020 0:59 UTC
    LW: 2 AF: 1
    AF Parent
    So even when you talk about amplifying f, you mean a certain way of extending human predictions to more complicated background information (e.g. via breaking down Z into chunks and then using copies of f that have been trained on smaller Z), not fine-tuning f to make better predictions.
    That’s right, f is either imitating a human, or it’s trained by iterated amplification / debate—in any case the loss function is defined by the human. In no case is f optimized to make good predictions about the underlying data.
    My impression is that your hope is that if Z and f start out human-like, then this is like specifying the “programming language” of a universal prior, so that search for highly-predictive Z, decoded through f, will give something that uses human concepts in predicting the world.
    Z should always be a human-readable (or amplified-human-readable) latent; it will necessarily remain human-readable because it has no purpose other than to help a human make predictions. f is going to remain human-like because it’s predicting what the human would say (or what the human-consulting-f would say etc.).
    The amplified human is like the programming language of the universal prior, Z is like the program that is chosen (or slightly more precisely: Z is like a distribution over programs, described in a human-comprehensible way) and f is an efficient distillation of the intractable ideal.