abramdemski comments on Formal Inner Alignment, Prospectus

abramdemski 14 May 2021 23:30 UTC
LW: 4 AF: 4
AF
However, when a problem involves both, it seems like we have to solve the outer part of the problem (i.e. figure out what-we-even-want), and once that’s solved, all that’s left for inner alignment is imperfect-optimizer-exploitation. The reverse does not apply: we do not necessarily have to solve the inner alignment issue (other than the imperfect-optimizer-exploiting part) at all.
The way I’m currently thinking of things, I would say the reverse also applies in this case.
We can turn optimization-under-uncertainty into well-defined optimization by assuming a prior. The outer alignment problem (in your sense) involves getting the prior right. Getting the prior right is part of “figuring out what we want”. But this is precisely the source of the inner alignment problems in the paul/evan sense: Paul was pointing out a previously neglected issue about the Solomonoff prior, and Evan is talking about inductive biases of machine learning algorithms (which is sort of like the combination of a prior and imperfect search).
So both you and Evan and Paul are agreeing that there’s this problem with the prior (/ inductive biases). It is distinct from other outer alignment problems (because we can, to a large extent, factor the problem of specifying an expected value calculation into the problem of specifying probabilities and the problem of specifying a value function / utility function / etc). Everyone would seem to agree that this part of the problem needs to be solved. The disagreement is just about whether to classify this part as “inner” and/or “outer”.
What is this problem like? Well, it’s broadly a quality-of-prior problem, but it has a different character from other quality-of-prior problems. For the most part, the quality of priors can be understood by thinking about average error being low, or mistakes becoming infrequent, etc. However, here, this kind of thinking isn’t sufficient: we are concerned with rare but catastrophic errors. Thinking about these things, we find ourselves thinking in terms of “agents inside the prior” (or agents being favored by the inductive biases).
To what extent “agents in the prior” should be lumped together with “agents in imperfect search”, I am not sure. But the term “inner optimizer” seems relevant.
I’d be interested in a more complete explanation of what optimization-under-uncertainty would mean, other than to take an expectation (or max/min, quantile, etc) to convert it into a deterministic optimization problem.
A good example of optimization-under-uncertainty that doesn’t look like that (at least, not overtly) is most applications of gradient descent.
1. The true objective is not well-defined. IE, machine learning people generally can’t write down an objective function which (a) spells out what they want, and (b) can be evaluated. (What you want is generalization accuracy for the presently-unknown deployment data.)
2. So, machine learning people create proxies to optimize. Training data is the start, but then you add regularizing terms to penalize complex theories.
3. But none of these proxies is the full expected value (ie, expected generalization accuracy). If we could compute the full expected value, we probably wouldn’t be searching for a model at all! We would just use the EV calculations to make the best decision for each individual case.
So you can see, we can always technically turn optimization-under-uncertainty into a well-defined optimization by providing a prior, but, this is usually so impractical that ML people often don’t even consider what their prior might be. Even if you did write down a prior, you’d probably have to do ordinary ML search to approximate that. Which goes to show that it’s pretty hard to eliminate the non-EV versions of optimization-under-uncertainty; if you try to do real EV, you end up using non-EV methods anyway, to approximate EV.
The fact that we’re not really optimizing EV, in typical applications of gradient descent, explains why methods like early stopping or dropout (or anything else that messes with the ability of gradient descent to optimize the given objective) might be useful. Otherwise, you would only expect to use modifications if they helped the search find higher-value items. But in real cases, we sometimes prefer items that have a lower score on our proxy, when the-way-we-got-that-item gives us other reason to expect it to be good (early stopping being the clearest example of this).
This in turn means we don’t even necessarily convert our problem to a real, solidly defined optimization problem, ever. We can use algorithms like gradient-descent-with-early-stopping just “because they work well” rather than because they optimize some specific quantity we can already compute.
Which also complicates your argument, since if we’re never converting things to well-defined optimization problems, we can’t factor things into “imperfect search problems” vs “alignment given perfect search”—because we’re not really using search algorithms (in the sense of algorithms designed to get the maximum value), we’re using algorithms with a strong family resemblance to search, but which may have a few overtly-suboptimal kinks thrown in because those kinks tend to reduce Goodharting.
In principle, a solution to an optimization-under-uncertainty problem needn’t look like search at all.
Ah, here’s an example: online convex optimization. It’s a solid example of optimization-under-uncertainty, but, not necessarily thought of in terms of a prior and an expectation.
So optimization-under-uncertainty doesn’t necessarily reduce to optimization.
I claim it’s usually better to think about optimization-under-uncertainty in terms of regret bounds, rather than reduce it to maximization. (EG this is why Vanessa’s approach to decision theory is superior.)
I’m not sure the optimization vs optimization-under-uncertainty distinction is actually all that central, though. Intuitively, the reason an objective isn’t well-defined without the data/prior is that the data/prior defines the ontology, or defines what the things-in-the-objective are pointing to (in the pointers-to-values sense) or something along those lines. If the objective function is f(X, Y), then the data/prior are what point “X” and “Y” at some things in the real world. That’s why the objective function cannot be meaningfully separated from the data/prior: “f(X, Y)” doesn’t mean anything, by itself.
But I could imagine the pointer-aspect of the data/prior could somehow be separated from the uncertainty-aspect. Obviously this would require a very different paradigm from either today’s ML or Bayesianism, but if those pieces could be separated, then I could imagine a notion of inner alignment (and possibly also something like robust generalization) which talks about both optimization and uncertainty, plus a notion of outer alignment which just talks about the objective and what it points to. In some ways, I actually like that formulation better, although I’m not clear on exactly what it would mean.
These remarks generally make sense to me. Indeed, I think the ‘uncertainty-aspect’ and the ‘search aspect’ would be rolled up into one, since imperfect search falls under the uncertainty aspect (being logical uncertainty). We might not even be able to point to which parts are prior vs search… as with “inductive bias” in ML. So inner alignment problems would always be “the uncertainty is messed up”—forcibly unifying your search-oriented view on daemons w/ Evan’s prior-oriented view. More generally, we could describe the ‘uncertainty’ part as where ‘capabilities’ live.
Naturally, this strikes me as related to what I’m trying to get at with optimization-under-uncertainty. An optimization-under-uncertainty algorithm takes a pointer, and provides all the ‘uncertainty’.
But I don’t think it should quite be about separating the pointer-aspect and the uncertainty-aspect. The uncertainty aspect has what I’ll call “mundane issues” (eg, does it converge well given evidence, does it keep uncertainty broad w/o evidence) and “extraordinary issues” (inner optimizers). Mundane issues can be investigated with existing statistical tools/concepts. But the extraordinary issues seem to require new concepts. The mundane issues have to do with things like averages and limit frequencies. The extraordinary issues have to do with one-time events.
The true heart of the problem is these “extraordinary issues”.