tailcalled comments on The Least Controversial Application of Geometric Rationality

tailcalled 25 Nov 2022 18:16 UTC
7 points
5
Observation to potentially connect this to some math that people might be more familiar with: when $P$ and $Q$ are probability distributions, then $G_{x \sim P} [Q (x)] = e^{E_{x \sim P} [ln Q (x)]} = e^{- H (P, Q)}$ , where $H$ is the cross-entropy.
- Scott Garrabrant 25 Nov 2022 18:39 UTC
  5 points
  1
  Parent
  Note that the cross entropy, (and thus $G_{x \sim P} [Q (x)]$ ) is dependent on meaningless details of what events you consider the same vs different, but $e^{H (P, P) - H (P, Q)} = G_{x \sim P} [Q (x)] / G_{x \sim P} [P (x)] = G_{x \sim P} [Q (x) / P (x)]$ is not (as much), and when maximizing with respect to $Q$ , this is the same maximization.
  (I am just pointing out that KL divergence is a more natural concept than cross entropy.)
  - tailcalled 25 Nov 2022 18:49 UTC
    2 points
    0
    Parent
    The middle piece here should be $G_{x \sim P} [Q (x)] / G_{x \sim P} [P (x)]$ , right?
    Anyway KL-divergence is based.
    - Scott Garrabrant 25 Nov 2022 19:27 UTC
      2 points
      0
      Parent
      Yeah, edited.
- tailcalled 25 Nov 2022 18:28 UTC
  2 points
  0
  Parent
  I think $e^{- H (P)}$ might also have a natural interpretation of something along the lines of “The probability that two consecutive samples from P are equal”. This holds exactly for the uniform distribution, but only holds approximately for the Bernoulli distribution, so this is not a perfect heuristic.