I think that paper and this one are complementary. Regularizing on the state-action distribution fixes problems with the action distribution, but if it’s still using KL divergence you still get the problems in this paper. The latest version on arxiv mentions this briefly.
I think that paper and this one are complementary. Regularizing on the state-action distribution fixes problems with the action distribution, but if it’s still using KL divergence you still get the problems in this paper. The latest version on arxiv mentions this briefly.