I think of mesaoptimization as primarily being concerning because it would mean models (selected using amortized optimization) doing their own direct optimization, and the extent to which the model is itself doing its own “direct” optimization vs just being “amortized” is what I would call the optimizer-controller spectrum (see this post also).
Also, it seems kind of inaccurate to declare that (non-RL) ML systems are fundamentally amortized optimization and then to say things like “more computation and better algorithms should improve safety and the primary risk comes from misgeneralization” and “amortized approaches necessarily have poor sample efficiency asymptotically” and only adding in a mention about mesaoptimizers in a postscript.
In my ontology, this corresponds to saying “current ML systems are very controller-y.” But the thing I’m worried about is that eventually at some point we’re going to figure out how to have models which in fact are more optimizer-y, for the same reasons people are trying to build AGI in the first place (though I do think there is a non-trivial chance that controller-y systems are in fact good enough to help us solve alignment, this is not something I’d bet on as a default!).
Relatedly, it doesn’t seem like because amortized optimizers are “just” modelling a distribution, this makes them inherently more benign. Everything can be phrased as distribution modelling! I think the core confusion here might be conflating the generalization properties of current ML architectures/methods with the type signature of ML. Moving the issue to data quality also doesn’t fix the problem at all; everything is “just” a data problem, too (if only we had the dataset indicating exactly how the superintelligent AGI should solve alignment, then we could simply behavior clone that dataset).
Everything can be phrased as distribution modelling.
That might a big claim, since Beren thinks there’s a real difference in type, and one example is that he thinks alignment solutions for model based agents coming out of GPT-N can’t work, due to amortized optimization. Thus, a non-vacous restriction is there.
I think of mesaoptimization as primarily being concerning because it would mean models (selected using amortized optimization) doing their own direct optimization, and the extent to which the model is itself doing its own “direct” optimization vs just being “amortized” is what I would call the optimizer-controller spectrum (see this post also).
Also, it seems kind of inaccurate to declare that (non-RL) ML systems are fundamentally amortized optimization and then to say things like “more computation and better algorithms should improve safety and the primary risk comes from misgeneralization” and “amortized approaches necessarily have poor sample efficiency asymptotically” and only adding in a mention about mesaoptimizers in a postscript.
In my ontology, this corresponds to saying “current ML systems are very controller-y.” But the thing I’m worried about is that eventually at some point we’re going to figure out how to have models which in fact are more optimizer-y, for the same reasons people are trying to build AGI in the first place (though I do think there is a non-trivial chance that controller-y systems are in fact good enough to help us solve alignment, this is not something I’d bet on as a default!).
Relatedly, it doesn’t seem like because amortized optimizers are “just” modelling a distribution, this makes them inherently more benign. Everything can be phrased as distribution modelling! I think the core confusion here might be conflating the generalization properties of current ML architectures/methods with the type signature of ML. Moving the issue to data quality also doesn’t fix the problem at all; everything is “just” a data problem, too (if only we had the dataset indicating exactly how the superintelligent AGI should solve alignment, then we could simply behavior clone that dataset).
That might a big claim, since Beren thinks there’s a real difference in type, and one example is that he thinks alignment solutions for model based agents coming out of GPT-N can’t work, due to amortized optimization. Thus, a non-vacous restriction is there.