Logan Riggs comments on Using GPT-N to Solve Interpretability of Neural Networks: A Research Agenda

Logan Riggs 5 Sep 2020 20:44 UTC
1 point
I’m expecting either (1) A future GPT’s meta-learning combined with better prompt engineering will be able to learn the correct distribution and find the correct distribution, respectively. Or (2) curating enough examples will be good enough (though I’m not sure if GPT-3 could do it even then).
- Ofer 6 Sep 2020 13:59 UTC
  1 point
  Parent
  When I said “we need GPT-N to learn a distribution over strings...” I was referring to the implicit distribution that the model learns during training. We need that distribution to assign more probability to the string [a modular NN specification followed by a prompt followed by a natural language description of the modules] than to [a modular NN specification followed by a prompt followed by an arbitrary string]. My concern is that maybe there is no prompt that will make this requirement fulfill.
  
  Re “curating enough examples”, this assumes humans are already able* to describe the modules of a sufficiently powerful language model (powerful enough to yield such descriptions).
  
  *Able in practice, not just in theory.