Neel Nanda comments on A Longlist of Theories of Impact for Interpretability

Neel Nanda 7 Apr 2022 21:17 UTC
2 points
Note that for interpretability to give you information on where you are relative to your competitors, you both need the tools to exist, and for AI companies to use the tools and publicly release the results. It’s pretty plausible to me that we get the first but not the second!
- MichaelBowlby 7 Apr 2022 23:06 UTC
  1 point
  Parent
  Yeah that sounds very plausible. It also seems plausible that we get regulation about transparency, and in all the cases where the benefit from interpretability has something to do with people interacting you get the results being released at least semi-publicly. Industrial espionge also seems a worry. The USSR was hugely successful in infultrating the Manhatten project and contined to successfully steal US tech throughout the cold war.
  Also worth noting the more information about how good one’s own model is also increases AI risk in the papers model, although they model it as a discrete shift from no information to full information so unclear well that model applies.