Past Account comments on [missing post]

Past Account 21 May 2021 14:23 UTC
1 point
The term $π$ is meant to be a posterior distribution after seeing data. If you have a good prior you could take $π = π_{0}$ . However, note $L (π)$ could be high. You want trade-off between the cost of updating the prior and the loss reduction.

Example, say we have a neural network. Then our prior would be the initialization and the posterior would be the distribution of outputs from SGD.

(Btw thanks for the correction)
- Steveot 24 May 2021 13:05 UTC
  1 point
  Parent
  Thanks, I finally got it. What I just now fully understood is that the final inequality holds with high $π_{0}^{n}$ probability (i.e., as you say, $π_{0}$ is the data), while the learning bound or loss reduction is given for $π$ .