Gurkenglas comments on Implications of GPT-2

Gurkenglas 18 Feb 2019 21:57 UTC
1 point
The loss function is computed by comparing its prediction during a training instance to the training label. The loss function is undefined after training. What does it mean for it to minimize the loss function while generating?
- Ofer 19 Feb 2019 10:48 UTC
  1 point
  Parent
  Sorry, I didn’t understand the question (and what you meant by “The loss function is undefined after training.”).
  After thinking about this more, I now think that my original description of this failure mode might be confusing: maybe it is more accurate to describe it as an inner optimizer problem. The guiding logic here is that if there are no inner optimizers then the question answering system, which was trained by supervised learning, “attempts” (during inference) to minimize the expected loss function value as defined by the original distribution from which the training examples were sampled; and any other goal system is the result of inner optimizers.
  (I need to think more about this)