TekhneMakre comments on Six (and a half) intuitions for KL divergence

TekhneMakre 10 Oct 2022 12:35 UTC
2 points
0
Nice. I didn’t know about the hypothesis testing one (or Bregman, but I don’t get that one). I wonder if one can back out another description of KL divergence in terms of mutual information from the expression of mutual information in terms of KL divergence: https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence#Mutual_information
- CallumMcDougall 10 Oct 2022 14:13 UTC
  1 point
  0
  Parent
  And yeah, despite a whole 16 lecture course on convex opti I still don’t really get Bregman either, I skipped the exam questions on it 😆
- CallumMcDougall 10 Oct 2022 14:11 UTC
  1 point
  0
  Parent
  Oh yeah, I hadn’t considered that one. I think it’s interesting, but the intuitions are better in the opposite direction, i.e. you can build on good intuitions for $D_{K L}$ to better understand MI. I’m not sure if you can easily get intuitions to point in the other direction (i.e. from MI to $D_{K L}$ ), because this particular expression has MI as an expectation over $D_{K L}$ , rather than the other way around. E.g. I don’t think this expression illuminates the nonsymmetry of $D_{K L}$ .
  The way it’s written here seems more illuminating (not sure if that’s the one that you meant). This gets across the idea that:
  $P_{(X, Y)}$ is the true reality, and $P_{X} \otimes P_{Y}$ is our (possibly incorrect) model which assumes independence. The mutual information between $X$ and $Y$ equals $D_{K L} (P_{(X, Y)} | | P_{X} \otimes P_{Y})$ , i.e. the extent to which modelling $X$ and $Y$ as independent (sharing no information) is a poor way of modelling the true state of affairs (where they do share information).
  But again I think this intuition works better in the other direction, since it builds on intuitions for $D_{K L}$ to better explain MI. The arguments in the $D_{K L}$ expression aren’t arbitrary (i.e. we aren’t working with $D_{K L} (P | | Q)$ ), which restricts the amount this can tell us about $D_{K L}$ in general.
  - TekhneMakre 10 Oct 2022 21:08 UTC
    2 points
    0
    Parent
    The arguments in the $D_{K L}$ expression aren’t arbitrary (i.e. we aren’t working with $D_{K L} (P | | Q)$ ), which restricts the amount this can tell us about $D_{K L}$ in general.
    Yeah, I was vaguely hoping one could phrase $P$ and $Q$ so they’re in that form, but I don’t see it.