Rohin Shah comments on Risks from Learned Optimization: Introduction

Rohin Shah 2 Jun 2019 22:24 UTC
LW: 17 AF: 7
AF
I would not call that mesa-optimization and would not take it as evidence that mesa-optimization is the “default” for powerful ML systems. That paper has a model with subagents where each subagent does optimization. Ways in which this is a different thing:
- Given an input, a mesa-optimizer would only run on that input once; in the case of this model there are 10 different optimizations happening in order to classify each digit.
- The base objective is “correctly map an image of a digit to its label”; the objective of the dth optimizer in the model is “Evidence Lower Bound (ELBO) on the log likelihood of the image as evaluated by a generative model for the digit d”. The model optimizers’ objectives are not of the right type signature and don’t agree with the base objective on the training distribution, as would be the case with a mesa optimizer.
Note that I do think mesa-optimization will be common; I just don’t think that that paper is evidence for the claim.