I suspect a lot of the disagreement might be about whether LLMs are something like consistent / context-independent optimizers of e.g. some utility function (they seem very unlikely to), not whether they’re capable of optimization in various (e.g. prompt-dependent, problem-dependent) contexts.
The top comment also seems to be conflating whether a model is capable of (e.g. sometimes, in some contexts) mesaoptimizing and whether it is (consistently) mesaoptimizing. I interpret the quoted original definition as being about the second, which LLMs probably aren’t, though they’re capable of the first.
This seems like the kind of ontological confusion that the Simulators post discusses at length.
If that were the intended definition, gradient descent wouldn’t count as an optimiser either. But they clearly do count it, else an optimiser gradient descent produces wouldn’t be a mesa-optimiser.
Gradient descent optimises whatever function you pass it. It doesn’t have a single set function it tries to optimise no matter what argument you call it with. If you don’t pass any valid function, it doesn’t optimise anything.
GPT-4, taken by itself, without a prompt, will optimise pretty much whatever you prompt it to optimise. If you don’t prompt it to optimise something, it usually doesn’t optimise anything.
I guess you could say GPT-4, unlike gradient descent, can do things other than optimise something. But if ever not optimising things excluded you from being an optimiser, humans wouldn’t be considered optimisers either.
So it seems to me that the paper just meant what it said in the quote. If you look through a search space to accomplish an objective, you are, at present, an optimiser.
If that were the intended definition, gradient descent wouldn’t count as an optimiser either. But they clearly do count it, else an optimiser gradient descent produces wouldn’t be a mesa-optimiser.
Gradient descent optimises whatever function you pass it. It doesn’t have a single set function it tries to optimise no matter what argument you call it with.
Gradient descent, in this sense of the term, is not an optimizer according to Risks from Learned Optimization.
Consider that Risks from Learned Optimization talks a lot about “the base objective” and “the mesa-objective.” This only makes sense if the objects being discussed are optimization algorithms together with specific, fixed choices of objective function.
“Gradient descent” in the most general sense is—as you note—not this sort of thing. Therefore, gradient descent in that general sense is not the kind of thing that Risks from Learned Optimization is about.
Gradient descent in this general sense is a “two-argument function,” GD(f,o), where f is the thing to be optimized and o is the objective function. The objects of interest in Risks from Learned Optimization are curried single-argument versions of such functions, GDo(f) for some specific choice of o, considered as a function of f alone.
It’s fairly common for people to say “gradient descent” when they mean GDo for some specific o, rather than the more generic GD. This is because in practice—unless you’re doing some weird experimental thing that’s not really “gradient descent” per se -- o is always fixed across the course of a run of gradient descent. When you run gradient descent to optimize an f, the result you get was not “optimized by gradient descent in general” (what would that even mean?), it was optimized for whichever o you chose by the corresponding GDo.
This is what licenses talking about “the base objective” when considering an SGD training run of a neural net. There is a base objective in such runs, it’s the loss function, we know exactly what it is, we wrote it down.
On the other hand, the notion that the optimized fs would have “mesa-objectives”—that they would themselves be objects like GDo with their own unchanging os, rather than being simply capable of context-dependent optimization of various targets, like GPT-4 or GD -- is a non-obvious claim/assumption(?) made by Risks from Learned Optimization. This claim doesn’t hold for GPT-4, and that’s why it is not a mesa-optimizer.
I suspect a lot of the disagreement might be about whether LLMs are something like consistent / context-independent optimizers of e.g. some utility function (they seem very unlikely to), not whether they’re capable of optimization in various (e.g. prompt-dependent, problem-dependent) contexts.
The top comment also seems to be conflating whether a model is capable of (e.g. sometimes, in some contexts) mesaoptimizing and whether it is (consistently) mesaoptimizing. I interpret the quoted original definition as being about the second, which LLMs probably aren’t, though they’re capable of the first. This seems like the kind of ontological confusion that the Simulators post discusses at length.
If that were the intended definition, gradient descent wouldn’t count as an optimiser either. But they clearly do count it, else an optimiser gradient descent produces wouldn’t be a mesa-optimiser.
Gradient descent optimises whatever function you pass it. It doesn’t have a single set function it tries to optimise no matter what argument you call it with. If you don’t pass any valid function, it doesn’t optimise anything.
GPT-4, taken by itself, without a prompt, will optimise pretty much whatever you prompt it to optimise. If you don’t prompt it to optimise something, it usually doesn’t optimise anything.
I guess you could say GPT-4, unlike gradient descent, can do things other than optimise something. But if ever not optimising things excluded you from being an optimiser, humans wouldn’t be considered optimisers either.
So it seems to me that the paper just meant what it said in the quote. If you look through a search space to accomplish an objective, you are, at present, an optimiser.
Gradient descent, in this sense of the term, is not an optimizer according to Risks from Learned Optimization.
Consider that Risks from Learned Optimization talks a lot about “the base objective” and “the mesa-objective.” This only makes sense if the objects being discussed are optimization algorithms together with specific, fixed choices of objective function.
“Gradient descent” in the most general sense is—as you note—not this sort of thing. Therefore, gradient descent in that general sense is not the kind of thing that Risks from Learned Optimization is about.
Gradient descent in this general sense is a “two-argument function,” GD(f,o), where f is the thing to be optimized and o is the objective function. The objects of interest in Risks from Learned Optimization are curried single-argument versions of such functions, GDo(f) for some specific choice of o, considered as a function of f alone.
It’s fairly common for people to say “gradient descent” when they mean GDo for some specific o, rather than the more generic GD. This is because in practice—unless you’re doing some weird experimental thing that’s not really “gradient descent” per se -- o is always fixed across the course of a run of gradient descent. When you run gradient descent to optimize an f, the result you get was not “optimized by gradient descent in general” (what would that even mean?), it was optimized for whichever o you chose by the corresponding GDo.
This is what licenses talking about “the base objective” when considering an SGD training run of a neural net. There is a base objective in such runs, it’s the loss function, we know exactly what it is, we wrote it down.
On the other hand, the notion that the optimized fs would have “mesa-objectives”—that they would themselves be objects like GDo with their own unchanging os, rather than being simply capable of context-dependent optimization of various targets, like GPT-4 or GD -- is a non-obvious claim/assumption(?) made by Risks from Learned Optimization. This claim doesn’t hold for GPT-4, and that’s why it is not a mesa-optimizer.