Charbel-Raphaël comments on Paper: In-context Reinforcement Learning with Algorithm Distillation [Deepmind]

Charbel-Raphaël 3 Nov 2022 7:11 UTC
2 points
0
This post argues there is no inner mesa-optimizer here:
https://www.lesswrong.com/posts/avvXAvGhhGgkJDDso/caution-when-interpreting-deepmind-s-in-context-rl-paper