DanielFilan comments on Challenge: know everything that the best go bot knows about go

DanielFilan 14 May 2021 18:12 UTC
2 points

One axis along which I’d like clarification is whether you want a form of explanation which is learner agnostic or learner specific?

I don’t know what you mean by “learner agnostic” or “learner specific”. Could you explain?
- Jacob Pfau 15 May 2021 1:42 UTC
  1 point
  Parent
  Not sure what the best way to formalize this intuition is, but here’s an idea. (To isolate this learner-agnostic/specific axis from the problem of defining explanation, let me assume that we have some metric for quantifying explanation quality, call it ‘R’ which is a function from <Model, learner, explanation> triples to real values.)
  Define learner-agnostic explanation as optimizing for aggregate R across some distribution of learners—finding the one optimal explanation across this distribution. Learner-specific explanation optimizes for R taking the learner as an input—finding multiple optimal explanations, one for each learner.
  The aggregation function in the learner-agnostic case could be the mean, or it could be a minimax function. The minimax case intuition would be formalizing the task of coming up with the most accessible explanation possible.
  Things like influence functions, input-sensitivity methods, automated concept discovery are all learner-agnostic. On the other hand, probing methods (e.g. as used in NLP) could maybe be called learner-specific. The variant of influence functions I suggested above is learner-specific.
  In general, it seems to me that as the models get more and more complex, we’ll probably need explanations to be more learner-specific to achieve reasonable performance. Though perhaps learner-agnostic methods will suffice for answering general questions like ‘Is my model optimizing for a mesa-objective’?
  - DanielFilan 3 Jun 2021 18:04 UTC
    2 points
    Parent
    I guess by ‘learner’ you mean the human, rather than the learned model? If so, then I guess your transparency/explanation/knowledge-extraction method could be learner-specific, and still succeed at the above challenge.