Jacob Pfau comments on Challenge: know everything that the best go bot knows about go

Jacob Pfau 12 May 2021 7:26 UTC
1 point
One axis along which I’d like clarification is whether you want a form of explanation which is learner agnostic or learner specific? It seems to me that traditional transparency/interpretability tools try to be learner agnostic, but on the other hand the most efficient way to explain makes use of the learner’s pre-existing knowledge, inductive biases, etc.
In the learner agnostic case, I think it will be approximately impossible to succeed at this challenge. In the learner specific case, I think it will require something more than an interpretability method. This latter task will benefit from better and better models of human learning—in the limit I imagine something like a direct brain neuralink should do the trick...
On the learner specific side, it seems to me Nisan is right when he said ‘The question is if we can compress the bot’s knowledge into, say, a 1-year training program for professionals.’ To that end, it seems like a relevant method could be an improved version of influence functions. Something like find in the training phase when the go agent learned to make a better move than the pro and highlight the games (/moves) which taught it the improved play.
- DanielFilan 14 May 2021 18:12 UTC
  2 points
  Parent
  
  One axis along which I’d like clarification is whether you want a form of explanation which is learner agnostic or learner specific?
  
  I don’t know what you mean by “learner agnostic” or “learner specific”. Could you explain?
  - Jacob Pfau 15 May 2021 1:42 UTC
    1 point
    Parent
    Not sure what the best way to formalize this intuition is, but here’s an idea. (To isolate this learner-agnostic/specific axis from the problem of defining explanation, let me assume that we have some metric for quantifying explanation quality, call it ‘R’ which is a function from <Model, learner, explanation> triples to real values.)
    Define learner-agnostic explanation as optimizing for aggregate R across some distribution of learners—finding the one optimal explanation across this distribution. Learner-specific explanation optimizes for R taking the learner as an input—finding multiple optimal explanations, one for each learner.
    The aggregation function in the learner-agnostic case could be the mean, or it could be a minimax function. The minimax case intuition would be formalizing the task of coming up with the most accessible explanation possible.
    Things like influence functions, input-sensitivity methods, automated concept discovery are all learner-agnostic. On the other hand, probing methods (e.g. as used in NLP) could maybe be called learner-specific. The variant of influence functions I suggested above is learner-specific.
    In general, it seems to me that as the models get more and more complex, we’ll probably need explanations to be more learner-specific to achieve reasonable performance. Though perhaps learner-agnostic methods will suffice for answering general questions like ‘Is my model optimizing for a mesa-objective’?
    - DanielFilan 3 Jun 2021 18:04 UTC
      2 points
      Parent
      I guess by ‘learner’ you mean the human, rather than the learned model? If so, then I guess your transparency/explanation/knowledge-extraction method could be learner-specific, and still succeed at the above challenge.