David Scott Krueger (formerly: capybaralet) comments on Developmental Stages of GPTs

David Scott Krueger (formerly: capybaralet) 1 Aug 2020 23:21 UTC
LW: 6 AF: 3
AF
I think it’s an essential time to support projects that can work for a GPT-style near-term AGI , for instance by incorporating specific alignment pressures during training. Intuitively, it seems as if Cooperative Inverse Reinforcement Learning or AI Safety via Debate or Iterated Amplification are in this class.
As I argued here, I think GPT-3 is more likely to be aligned than whatever we might do with CIRL/IDA/Debate ATM, since it is trained with (self)-supervised learning and gradient descent.
The main reason such a system could pose an x-risk by itself seems to be mesa-optimization, so studying mesa-optimization in the context of such systems is a priority (esp. since GPT-3′s 0-shot learning looks like mesa-optimization).
In my mind, things like IDA become relevant when we start worrying about remaining competitive with agent-y systems built using self-supervised learning systems as a component, but actually come with a safety cost relative to SGD-based self-supervised learning.
This is less the case when we think about them as methods for increasing interpretability, as opposed to increasing capabilities (which is how I’ve mostly seen them framed recently, a la the complexity theory analogies).
- John_Maxwell 20 Sep 2020 2:01 UTC
  LW: 2 AF: 1
  AF Parent
  BTW with regard to “studying mesa-optimization in the context of such systems”, I just published this post: Why GPT wants to mesa-optimize & how we might change this.
  
  I’m still thinking about the point you made in the other subthread about MAML. It seems very plausible to me that GPT is doing MAML type stuff. I’m still thinking about if/how that could result in dangerous mesa-optimization.
- John_Maxwell 17 Sep 2020 22:54 UTC
  LW: 2 AF: 1
  AF Parent
  
  esp. since GPT-3′s 0-shot learning looks like mesa-optimization
  
  Could you provide more details on this?
  
  Sometimes people will give GPT-3 a prompt with some examples of inputs along with the sorts of responses they’d like to see from GPT-3 in response to those inputs (“few-shot learning”, right? I don’t know what 0-shot learning you’re referring to.) Is your claim that GPT-3 succeeds at this sort of task by doing something akin to training a model internally?
  
  If that’s what you’re saying… That seems unlikely to me. GPT-3 is essentially a stack of 96 transformers right? So if it was doing something like gradient descent internally, how many consecutive iterations would it be capable of doing? It seems more likely to me that GPT-3 is simply able to learn sufficiently rich internal representations such that when the input/output examples are within its context window, it picks up their input/output structure and forms a sufficiently sophisticated conception of that structure that the word that scores highest according to next-word prediction is a word that comports with the structure.
  
  96 transformers would appear to offer a very limited budget for any kind of serial computation, but there’s a lot of parallel computation going on there, and there are non-gradient-descent optimization algorithms, genetic algorithms say, that can be parallelized. I guess the query matrix could be used to implement some kind of fitness function? It would be interesting to try some kind of layer-wise pretraining on transformer blocks and train them to compute steps in a parallelizable optimization algorithm (probably you’d want to pick a deterministic algorithm which is parallelizable instead of a stochastic algorithm like genetic algorithms). Then you could look at the resulting network and based on it, try to figure out what the telltale signs of a mesa-optimizer are (since this network is almost certainly implementing a mesa-optimizer).
  
  Still, my impression is you need 1000+ generations to get interesting results with genetic algorithms, which seems like a lot of serial computation relative to GPT-3′s budget...
  - David Scott Krueger (formerly: capybaralet) 18 Sep 2020 23:25 UTC
    LW: 3 AF: 2
    AF Parent
    Sometimes people will give GPT-3 a prompt with some examples of inputs along with the sorts of responses they’d like to see from GPT-3 in response to those inputs (“few-shot learning”, right? I don’t know what 0-shot learning you’re referring to.)
    No, that’s zero-shot. Few shot is when you train on those instead of just stuffing them into the context.
    It looks like mesa-optimization because it seems to be doing something like learning about new tasks or new prompts that are very different from anything its seen before, without any training, just based on the context (0-shot).
    Is your claim that GPT-3 succeeds at this sort of task by doing something akin to training a model internally?
    By “training a model”, I assume you mean “a ML model” (as opposed to, e.g. a world model). Yes, I am claiming something like that, but learning vs. inference is a blurry line.
    I’m not saying it’s doing SGD; I don’t know what it’s doing in order to solve these new tasks. But TBC, 96 steps of gradient descent could be a lot. MAML does meta-learning with 1.
    - John_Maxwell 19 Sep 2020 1:18 UTC
      LW: 2 AF: 1
      AF Parent
      Thanks!