lifelonglearner comments on Using GPT-N to Solve Interpretability of Neural Networks: A Research Agenda

lifelonglearner 4 Sep 2020 1:24 UTC
3 points
Interesting stuff!

My understanding is that the OpenAI Microscope (is this what you meant by microscope AI?) is mostly feature visualization techniques + human curation by looking at the visualized samples. Do you have thoughts on how to modify this for the text domain?
- evhub 4 Sep 2020 2:11 UTC
  3 points
  Parent
  Microscope AI as a term refers to the proposal detailed here, though I agree that I don’t really understand the usage in this post and I suspect the authors probably did mean OpenAI Microscope.
  What links here?
  - Ofer's comment on Using GPT-N to Solve Interpretability of Neural Networks: A Research Agenda by Logan Riggs (4 Sep 2020 18:58 UTC; 3 points)
  - Gurkenglas 4 Sep 2020 12:22 UTC
    2 points
    Parent
    We meant the linked proposal. Although I don’t think we need to do more than verify a GPT’s safety, this approach could be used to understand AI enough to design a safe one ourselves, so long as enforcing modularity does not compromise capability.