Lucius Bushnaq comments on RL with KL penalties is better seen as Bayesian inference

Lucius Bushnaq 6 Jun 2022 14:08 UTC
2 points
Is a(x) in the formulas supposed to be pi_0(x)?
- Tomek Korbak 7 Jun 2022 16:36 UTC
  1 point
  Parent
  good catch, yes, thanks!