johnswentworth comments on Alignment As A Bottleneck To Usefulness Of GPT-3

johnswentworth 26 Jul 2020 16:47 UTC
LW: 8 AF: 3
AF
This is a very strong claim which to my knowledge has not been well-justified anywhere. Daniel K agreed with me the other day that there isn’t a standard reference for this claim. Do you know of one?
There isn’t a standard reference because the argument takes one sentence, and I’ve been repeating it over and over again: what would Bayesian updates on low-level physics do? That’s the unique solution with best-possible predictive power, so we know that anything which scales up to best-possible predictive power in the limit will eventually behave that way.
(BTW I think that link is dead)
My perception of your behavior in this thread is: instead of talking about whether the bridge can be extended, you changed the subject and explained that the real problem is that the bridge has to support very heavy trucks. This is logically rude. And it makes it impossible to have an in-depth discussion about whether the bridge design can actually be extended or not.
The “what would Bayesian updates on a low-level model do?” question is exactly the argument that the bridge design cannot be extended indefinitely, which is why I keep bringing it up over and over again.
This does point to one possibly-useful-to-notice ambiguous point: the difference between “this method would produce an aligned AI” vs “this method would continue to produce aligned AI over time, as things scale up”. I am definitely thinking mainly about long-term alignment here; I don’t really care about alignment on low-power AI like GPT-3 except insofar as it’s a toy problem for alignment of more powerful AIs (or insofar as it’s profitable, but that’s a different matter).
I’ve been less careful than I should be about distinguishing these two in this thread. All these things which we’re saying “might work” are things which might work in the short term on some low-power AI, but will definitely not work in the long term on high-power AI. That’s probably part of why it seems like I keep switching positions—I haven’t been properly distinguishing when we’re talking short-term vs long-term.
A second comment on this:
instead of talking about whether the bridge can be extended, you changed the subject and explained that the real problem is that the bridge has to support very heavy trucks
If we want to make a piece of code faster, the first step is to profile the code to figure out which step is the slow one. If we want to make a beam stronger, the first step is to figure out where it fails. If we want to extend a bridge design, the first step is to figure out which piece fails under load if we just elongate everything.
Likewise, if we want to scale up an AI alignment method, the first step is to figure out exactly how it fails under load as the AI’s capabilities grow.
I think you currently do not understand the failure mode I keep pointing to by saying “what would Bayesian updates on low-level physics do?”. Elsewhere in the thread, you said that optimizing “for having a diverse range of models that all seem to fit the data” would fix the problem, which is my main evidence that you don’t understand the problem. The problem is not “the data underdetermines what we’re asking for”, the problem is “the data fully determines what we’re asking for, and we’re asking for a proxy rather than the thing we actually want”.