abramdemski comments on Formal Inner Alignment, Prospectus

abramdemski 18 May 2021 16:56 UTC
LW: 2 AF: 2
0
AF
- Your examples in the other comment do feel closely related to your ideas on learning normativity, whereas inner agency problems do not feel particularly related to that (or at least not any more so than anything else is related to normativity).
Could you elaborate on that? I do think that learning-normativity is more about outer alignment. However, some ideas might cross-apply.
- It feels like “optimization under uncertainty” is not quite the right name for the thing you’re trying to point to with that phrase, and I think your explanations would make more sense if we had a better name for it.
Well, it still seems like a good name to me, so I’m curious what you are thinking here. What name would communicate better?
- It does seem like there’s in important sense in which inner agency problems are about uncertainty, in a way which could potentially be factored out, but that seems less true of the examples in your other comment. (Or to the extent that it is true of those examples, it seems true in a different way than the inner agency examples.)
Again, I need more unpacking to be able to say much (or update much).
- The pointers problem feels more tightly entangled with your optimization-under-uncertainty examples than with inner agency examples.
Well, the optimization-under-uncertainty is an attempt to make a frame which can contain both, so this isn’t necessarily a problem… but I am curious what feels non-tight about inner agency.
… so I guess my main gut-feel at this point is that it does seem very plausible that uncertainty-handling (and inner agency with it) could be factored out of goal-specification (including pointers), but this particular idea of optimization-under-uncertainty seems like it’s capturing something different. (Though that’s based on just a handful of examples, so the idea in your head is probably quite different from what I’ve interpolated from those examples.)
On a side note, it feels weird to be the one saying “we can’t separate uncertainty-handling from goals” and you saying “ok but it seems like goals and uncertainty could somehow be factored”. Usually I expect you to be the one saying uncertainty can’t be separated from goals, and me to say the opposite.
I still agree with the hypothetical me making the opposite point ;p The problem is that certain things are being conflated, so both “uncertainty can’t be separated from goals” and “uncertainty can be separated from goals” have true interpretations. (I have those interpretations clear in my head, but communication is hard.)
OK, so.
My sense of our remaining disagreement…
We agree that the pointers/uncertainty could be factored (at least informally—currently waiting on any formalism).
You think “optimization under uncertainty” is doing something different, and I think it’s doing something close.
Specifically, I think “optimization under uncertainty” importantly is not necessarily best understood as the standard Bayesian thing where we (1) start with a utility function, (2) provide a prior, so that we can evaluate expected value (and 2.5, update on any evidence), (3) provide a search method, so that we solve the whole thing by searching for the highest-expectation element. Many examples of optimization-under-uncertainty strain this model. Probably the pointer/uncertainty model would do a better job in these cases. But, the Bayesian model is kind of the only one we have, so we can use it provisionally. And when we do so, the approximation of pointer-vs-uncertainty that comes out is:
Pointer: The utility function.
Uncertainty: The search plus the prior, which in practice can blend together into “inductive bias”.
This isn’t perfect, by any means, but, I’m like, “this isn’t so bad, right?”
I mean, I think this approximation is very not-good for talking about the pointers problem. But I think it’s not so bad for talking about inner alignment.
I almost want to suggest that we hold off on trying to resolve this, and first, I write a whole post about “optimization under uncertainty” which clarifies the whole idea and argues for its centrality. However, I kind of don’t have time for that atm.