Out of curiosity, this morning I did a literature search about “hard-coded optimization” in the gradient-based learning space—that is, people deliberately setting up “inner” optimizers in their neural networks because it seems like a good way to solve tasks. (To clarify, I don’t mean deliberately trying to make a general-purpose architecture learn an optimization algorithm, but rather, baking an optimization algorithm into an architecture and letting the architecture learn what to do with it.)
Why is this interesting?
The most compelling arguments in Risks from Learned Optimzation that mesa-optimizers will appear involve competitiveness: incorporating online optimization into a policy can help with generalization, compression, etc.
If inference-time optimization really does help competitiveness, we should expect to see some of the relevant competitors trying to do it on purpose.
I recall some folks saying in 2019 that the apparent lack of this seemed like evidence against the arguments that mesa-optimizers will be competitive.
To the extent there is now a trend toward explicit usage of inference-time optimizers, that supports the arguments that mesa-optimizers would be competitive, and thus may emerge accidentally as general-purpose architectures scale up.
More importantly (and also mentioned in Risks from Learned Optimzation, as “hard-coded optimization”), if the above arguments hold, then it would help safety to bake in inference-time optimization on purpose, since we can better control and understand optimization when it’s engineered—assuming that engineering it doesn’t sacrifice task performance (so that the incentive for the base optimizer to evolve a de novo mesa-optimizer is removed).
So, engineered inference-time optimization is plausibly one of those few capabilities research directions that is weaklydifferential tech development in the sense that it accelerates safe AI more than it accelerates unsafe AI (although it accelerates both). I’m not confident enough about this to say that it’s a good direction to work on, but I am saying it seems like a good direction to be aware of and occasionally glance at.
My impression is that a majority of AI alignment/safety agendas/proposals the last few years have carried a standard caveat that they don’t address the inner alignment problem at all, or at least deceptive alignment in particular.
As far as I can tell, there are few public directions about how to address deceptive alignment:
The ELK report has an intriguing paragraph suggesting that solutions to ELK might transfer to deceptive alignment via an analogy about “honesty”, but my sense is that solving ELK for learned optimizers (which has its own, much longer section of the report) will likely require additional new ideas beyond a solution to the “base case” of ELK (although I do agree with the report that the latter is currently a more exciting research direction than thinking explicitly about learned optimizers).
Although hard-coding optimization certainly doesn’t rule out learned optimization, I’m optimistic that it may be an important component of a suite of safety mechanisms (combined with “transparency via ELK” and perhaps one or two other major ideas, which may not be discovered yet) that finally rule out deceptive alignment.
The introduction here is a great explanation of why ML engineers want to incorporate an optimizer into a learning system, rather than learning a predictive model and then applying an engineered optimizer.
But their main theoretical claims rest on a concept of “predictive model” that only outputs point estimates, rather than distributions (and so cannot represent correlated variables).
This was my starting point—a very neat piece of work from the Boyd group (probably the top convex-optimization lab in US academia), where the cone program solver in CVXPY is adapted to not only calculate solutions for the forward pass, but also pull gradient covectors in the solution space back to gradient covectors on the problem specification for the backward pass.
A way to approximate dynamic programming as convex programming—for combinatorial optimization and belief propagation, though, rather than reinforcement learning.
Adapts an old-school symbolic-reasoning-style method based on integer linear programming to be a convex differentiable layer that sits on top of a 110M-parameter BERT.
Training the two together:
of course makes a huge improvement over the old-school method by itself,
a noticeable improvement over the BERT by itself, and
even a marginal improvement over a much larger (340M-parameter) BERT by itself.
Here, a differentiable convex optimization layer is used as a subproblem of a physics predictor, specifically for contact dynamics—based on the physical principle that contact forces maximize the rate of energy dissipation. The physics predictor is then wrapped in a gradient-based optimization to compute optimal actions starting from a given initial state.
This is a neat motivating example for a mesa-mesa-optimizer: layer 1 wants a good policy, layer 2 wants a good action, and layer 3 wants a good prediction (where physics itself can sometimes best be predicted using an optimizer).
This is a follow-up from the Boyd group where they apply their idea to a bunch of toy problems. Most compelling is section 5.4 on vehicle control; the financial portfolio and supply-chain problems are also interesting.
This isn’t actually building a hard-coded optimizer into a gradient-based learning architecture—instead, it’s better described as building a bit of gradient-based learning into a hard-coded optimizer. Tangential but still interesting.
An intriguing approach to something like ELK is done here, where an optimal policy parameterized by unknowns about the actual dynamics of the building/HVAC system is trained with imitation learning from expert demonstrations, and thereby learns the parameters of the system dynamics.
Convex optimization layers are applied to solve for efficient power generation/transmission choices, while the neural-network layers are used to model the unpredictable output of renewable sources.
Here are some other potentially relevant papers I haven’t processed yet:
Ma, Hengbo, Bike Zhang, Masayoshi Tomizuka, and Koushil Sreenath. “Learning Differentiable Safety-Critical Control Using Control Barrier Functions for Generalization to Novel Environments.” ArXiv:2201.01347 [Cs, Eess], January 7, 2022. http://arxiv.org/abs/2201.01347.
Rojas, Junior, Eftychios Sifakis, and Ladislav Kavan. “Differentiable Implicit Soft-Body Physics.” ArXiv:2102.05791 [Cs], September 9, 2021. http://arxiv.org/abs/2102.05791.
Srinivas, Aravind, Allan Jabri, Pieter Abbeel, Sergey Levine, and Chelsea Finn. “Universal Planning Networks: Learning Generalizable Representations for Visuomotor Control.” In Proceedings of the 35th International Conference on Machine Learning, 4732–41. PMLR, 2018. https://proceedings.mlr.press/v80/srinivas18b.html.
Out of curiosity, this morning I did a literature search about “hard-coded optimization” in the gradient-based learning space—that is, people deliberately setting up “inner” optimizers in their neural networks because it seems like a good way to solve tasks. (To clarify, I don’t mean deliberately trying to make a general-purpose architecture learn an optimization algorithm, but rather, baking an optimization algorithm into an architecture and letting the architecture learn what to do with it.)
Why is this interesting?
The most compelling arguments in Risks from Learned Optimzation that mesa-optimizers will appear involve competitiveness: incorporating online optimization into a policy can help with generalization, compression, etc.
If inference-time optimization really does help competitiveness, we should expect to see some of the relevant competitors trying to do it on purpose.
I recall some folks saying in 2019 that the apparent lack of this seemed like evidence against the arguments that mesa-optimizers will be competitive.
To the extent there is now a trend toward explicit usage of inference-time optimizers, that supports the arguments that mesa-optimizers would be competitive, and thus may emerge accidentally as general-purpose architectures scale up.
More importantly (and also mentioned in Risks from Learned Optimzation, as “hard-coded optimization”), if the above arguments hold, then it would help safety to bake in inference-time optimization on purpose, since we can better control and understand optimization when it’s engineered—assuming that engineering it doesn’t sacrifice task performance (so that the incentive for the base optimizer to evolve a de novo mesa-optimizer is removed).
So, engineered inference-time optimization is plausibly one of those few capabilities research directions that is weakly differential tech development in the sense that it accelerates safe AI more than it accelerates unsafe AI (although it accelerates both). I’m not confident enough about this to say that it’s a good direction to work on, but I am saying it seems like a good direction to be aware of and occasionally glance at.
My impression is that a majority of AI alignment/safety agendas/proposals the last few years have carried a standard caveat that they don’t address the inner alignment problem at all, or at least deceptive alignment in particular.
As far as I can tell, there are few public directions about how to address deceptive alignment:
Myopia (probably trading off competitiveness)
Penalizing complexity (description, computational, etc.; seems to be rough consensus this won’t work)
Transparency via inspection
Adversarial training (aka transparency via training)
Hard-coded optimization (aka transparency via architecture)
The ELK report has an intriguing paragraph suggesting that solutions to ELK might transfer to deceptive alignment via an analogy about “honesty”, but my sense is that solving ELK for learned optimizers (which has its own, much longer section of the report) will likely require additional new ideas beyond a solution to the “base case” of ELK (although I do agree with the report that the latter is currently a more exciting research direction than thinking explicitly about learned optimizers).
Although hard-coding optimization certainly doesn’t rule out learned optimization, I’m optimistic that it may be an important component of a suite of safety mechanisms (combined with “transparency via ELK” and perhaps one or two other major ideas, which may not be discovered yet) that finally rule out deceptive alignment.
Anyway, here’s (some of) what I found:
Task-independent work
Cameron, Chris, Jason Hartford, Taylor Lundy, and Kevin Leyton-Brown. “The Perils of Learning Before Optimizing.” AAAI 2022, December 16, 2021
The introduction here is a great explanation of why ML engineers want to incorporate an optimizer into a learning system, rather than learning a predictive model and then applying an engineered optimizer.
But their main theoretical claims rest on a concept of “predictive model” that only outputs point estimates, rather than distributions (and so cannot represent correlated variables).
Agrawal, Akshay, Brandon Amos, Shane Barratt, Stephen Boyd, et al. “Differentiable Convex Optimization Layers.” NIPS 2019, October 28, 2019
This was my starting point—a very neat piece of work from the Boyd group (probably the top convex-optimization lab in US academia), where the cone program solver in CVXPY is adapted to not only calculate solutions for the forward pass, but also pull gradient covectors in the solution space back to gradient covectors on the problem specification for the backward pass.
Butler, Andrew, and Roy Kwon. “Efficient Differentiable Quadratic Programming Layers: An ADMM Approach.” ArXiv:2112.07464, December 14, 2021
A recent development of a much more efficient algorithm for the special case of differentiating through a quadratic program solver.
Mensch, Arthur, and Mathieu Blondel. “Differentiable Dynamic Programming for Structured Prediction and Attention.” ICML 2018, February 20, 2018
A way to approximate dynamic programming as convex programming—for combinatorial optimization and belief propagation, though, rather than reinforcement learning.
Gilton, Davis, Gregory Ongie, and Rebecca Willett. “Deep Equilibrium Architectures for Inverse Problems in Imaging.” ArXiv:2102.07944, June 2, 2021
A very nifty recipe for back-propagating through a fixed-point algorithm (adapting the same fixed-point algorithm to calculate the backward pass).
Also has task-specific parts for reconstructing 2D and 3D images
NLP
Thayaparan, Mokanarangan, Marco Valentino, Deborah Ferreira, Julia Rozanova, and André Freitas. “∂-Explainer: Abductive Natural Language Inference via Differentiable Convex Optimization.” ArXiv:2105.03417, May 17, 2021
Adapts an old-school symbolic-reasoning-style method based on integer linear programming to be a convex differentiable layer that sits on top of a 110M-parameter BERT.
Training the two together:
of course makes a huge improvement over the old-school method by itself,
a noticeable improvement over the BERT by itself, and
even a marginal improvement over a much larger (340M-parameter) BERT by itself.
Robotics
Zhong, Yaofeng Desmond, Biswadip Dey, and Amit Chakraborty. “Extending Lagrangian and Hamiltonian Neural Networks with Differentiable Contact Models.” NIPS 2021, November 12, 2021
Here, a differentiable convex optimization layer is used as a subproblem of a physics predictor, specifically for contact dynamics—based on the physical principle that contact forces maximize the rate of energy dissipation. The physics predictor is then wrapped in a gradient-based optimization to compute optimal actions starting from a given initial state.
This is a neat motivating example for a mesa-mesa-optimizer: layer 1 wants a good policy, layer 2 wants a good action, and layer 3 wants a good prediction (where physics itself can sometimes best be predicted using an optimizer).
Agrawal, Akshay, Shane Barratt, Stephen Boyd, and Bartolomeo Stellato. “Learning Convex Optimization Control Policies.” ArXiv:1912.09529, December 19, 2019
This is a follow-up from the Boyd group where they apply their idea to a bunch of toy problems. Most compelling is section 5.4 on vehicle control; the financial portfolio and supply-chain problems are also interesting.
Srikanth, Shashank, Mithun Babu, Houman Masnavi, Arun Kumar Singh, Karl Kruusamäe, and K. Madhava Krishna. “Fast Adaptation of Manipulator Trajectories to Task Perturbation By Differentiating through the Optimal Solution.” ArXiv:2011.00488, November 1, 2020
This isn’t actually building a hard-coded optimizer into a gradient-based learning architecture—instead, it’s better described as building a bit of gradient-based learning into a hard-coded optimizer. Tangential but still interesting.
Energy systems
Chen, Bingqing, Zicheng Cai, and Mario Bergés. “Gnu-RL: A Precocial Reinforcement Learning Solution for Building HVAC Control Using a Differentiable MPC Policy.” In Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, 316–25. BuildSys ’19. New York, NY, USA: Association for Computing Machinery, 2019
An intriguing approach to something like ELK is done here, where an optimal policy parameterized by unknowns about the actual dynamics of the building/HVAC system is trained with imitation learning from expert demonstrations, and thereby learns the parameters of the system dynamics.
Chen, Yize. “Learning to Operate a Sustainable Power System.” PhD thesis, University of Washington, 2021.
Convex optimization layers are applied to solve for efficient power generation/transmission choices, while the neural-network layers are used to model the unpredictable output of renewable sources.
Here are some other potentially relevant papers I haven’t processed yet:
Ma, Hengbo, Bike Zhang, Masayoshi Tomizuka, and Koushil Sreenath. “Learning Differentiable Safety-Critical Control Using Control Barrier Functions for Generalization to Novel Environments.” ArXiv:2201.01347 [Cs, Eess], January 7, 2022. http://arxiv.org/abs/2201.01347.
Rojas, Junior, Eftychios Sifakis, and Ladislav Kavan. “Differentiable Implicit Soft-Body Physics.” ArXiv:2102.05791 [Cs], September 9, 2021. http://arxiv.org/abs/2102.05791.
Srinivas, Aravind, Allan Jabri, Pieter Abbeel, Sergey Levine, and Chelsea Finn. “Universal Planning Networks: Learning Generalizable Representations for Visuomotor Control.” In Proceedings of the 35th International Conference on Machine Learning, 4732–41. PMLR, 2018. https://proceedings.mlr.press/v80/srinivas18b.html.