This post primarily argues that a phenomenon is evidence for [learned models being likely to encode search algorithms]
I do mention interpreting the described results as tentative evidence for mesa-optimization, and this interpretation was why I wrote the post; my impression is still that this interpretation was basically correct. But most of the post is just quotes or paraphrased claims made by DeepMind researchers, rather than my own claims, since I didn’t feel sure enough to make the claims myself.
I do mention interpreting the described results as tentative evidence for mesa-optimization, and this interpretation was why I wrote the post; my impression is still that this interpretation was basically correct. But most of the post is just quotes or paraphrased claims made by DeepMind researchers, rather than my own claims, since I didn’t feel sure enough to make the claims myself.