SoerenMind comments on Risks from Learned Optimization: Introduction

SoerenMind 24 Jun 2019 19:22 UTC
21 points
This recent Deepmind paper seems to claim that they found a mesa optimizer. E. g. suppose their LSTM observes an initial state. You can let the LSTM ‘think’ about what to do by feeding it that state multiple times in a row. The more time it had to think, the better it acts. It has more properties like that. It’s a pretty standard LSTM so part of their point is that this is common.

https://arxiv.org/abs/1901.03559v1