DanielFilan comments on EIS II: What is “Interpretability”?

DanielFilan 28 Mar 2023 23:20 UTC
LW: 4 AF: 4
1
AF
Re: the gorilla example, seems worth noting that the solution that was actually deployed ended up being refusing to classify anything as a gorilla, at least as of 2018 (perhaps things have changed since then).
- DanielFilan 28 Mar 2023 23:21 UTC
  LW: 4 AF: 3
  2
  AF Parent
  I guess this proves the superiority of the mechanistic interpretability technique “note that it is mechanistically possible for your model to say that things are gorillas” :P