Error

LW server reports: not allowed.

This probably means the post has been deleted or moved back to the author's drafts.

Steveot 21 May 2021 11:29 UTC
4 points
Thanks, I was wondering what people referred to when mentioning PAC-Bayes bounds. I am still a bit confused. Could you explain how $L (π)$ and $^L (π)$ depend on $π_{0}$ (if they do) and how to interpret the final inequality in this light? Particularly I am wondering because the bound seems to be best when $π = π_{0}$ . Minor comment: I think $n = m$ ?
- Past Account 21 May 2021 14:23 UTC
  1 point
  Parent
  The term $π$ is meant to be a posterior distribution after seeing data. If you have a good prior you could take $π = π_{0}$ . However, note $L (π)$ could be high. You want trade-off between the cost of updating the prior and the loss reduction.
  
  Example, say we have a neural network. Then our prior would be the initialization and the posterior would be the distribution of outputs from SGD.
  
  (Btw thanks for the correction)
  - Steveot 24 May 2021 13:05 UTC
    1 point
    Parent
    Thanks, I finally got it. What I just now fully understood is that the final inequality holds with high $π_{0}^{n}$ probability (i.e., as you say, $π_{0}$ is the data), while the learning bound or loss reduction is given for $π$ .
Charlie Steiner 22 May 2021 22:02 UTC
2 points
I’m still confused about the part where you use the Hoeffding inequality—how is the lambda in that step and the lambda in the loss function “the same lambda”?
- Past Account 22 May 2021 23:15 UTC
  1 point
  Parent
  Because $f = λ \cdot Δ L$ . They are the same. Does that help?