Rubi J. Hudson comments on Conditioning Predictive Models: Interactions with other approaches

Rubi J. Hudson 20 Feb 2023 2:07 UTC
3 points
0
Yes, you are correct that RL with KL penalties only approximates a Bayesian update in the limit, after enough steps to converge. Determining the speed of this convergence, especially for LLMs, remains an area for future work.