In the first equation under “KL-regularised RL as variational inference,” I think a(x) should be π0(x).
fixed, thanks!
In the first equation under “KL-regularised RL as variational inference,” I think a(x) should be π0(x).
fixed, thanks!