nostalgebraist comments on Lucius Bushnaq’s Shortform

nostalgebraist Jul 28, 2024, 6:03 PM
24 points
13
If that were the intended definition, gradient descent wouldn’t count as an optimiser either. But they clearly do count it, else an optimiser gradient descent produces wouldn’t be a mesa-optimiser.

Gradient descent optimises whatever function you pass it. It doesn’t have a single set function it tries to optimise no matter what argument you call it with.
Gradient descent, in this sense of the term, is not an optimizer according to Risks from Learned Optimization.
Consider that Risks from Learned Optimization talks a lot about “the base objective” and “the mesa-objective.” This only makes sense if the objects being discussed are optimization algorithms together with specific, fixed choices of objective function.
“Gradient descent” in the most general sense is—as you note—not this sort of thing. Therefore, gradient descent in that general sense is not the kind of thing that Risks from Learned Optimization is about.
Gradient descent in this general sense is a “two-argument function,” $G D (f, o)$ , where $f$ is the thing to be optimized and $o$ is the objective function. The objects of interest in Risks from Learned Optimization are curried single-argument versions of such functions, $G D_{o} (f)$ for some specific choice of $o$ , considered as a function of $f$ alone.
It’s fairly common for people to say “gradient descent” when they mean $G D_{o}$ for some specific $o$ , rather than the more generic $G D$ . This is because in practice—unless you’re doing some weird experimental thing that’s not really “gradient descent” per se -- $o$ is always fixed across the course of a run of gradient descent. When you run gradient descent to optimize an $f$ , the result you get was not “optimized by gradient descent in general” (what would that even mean?), it was optimized for whichever $o$ you chose by the corresponding $G D_{o}$ .
This is what licenses talking about “the base objective” when considering an SGD training run of a neural net. There is a base objective in such runs, it’s the loss function, we know exactly what it is, we wrote it down.
On the other hand, the notion that the optimized $f$ s would have “mesa-objectives”—that they would themselves be objects like $G D_{o}$ with their own unchanging $o$ s, rather than being simply capable of context-dependent optimization of various targets, like GPT-4 or $G D$ -- is a non-obvious claim/assumption(?) made by Risks from Learned Optimization. This claim doesn’t hold for GPT-4, and that’s why it is not a mesa-optimizer.