habryka comments on Should we publish mechanistic interpretability research?

habryka 23 Apr 2023 4:34 UTC
2 points
3
I am not sure what you mean. Anthropic clearly is aiming to make capability advances. The linked comment just says that they aren’t seeking capability advances for the sake of capability advances, but want some benefit like better insight into safety, or better competitive positioning.
- JakubK 26 Apr 2023 22:26 UTC
  3 points
  0
  Parent
  Oh I see; I read too quickly. I interpreted your statement as “Anthropic clearly couldn’t care less about shortening timelines,” and I wanted to show that the interpretability team seems to care.
  Especially since this post is about capabilities externalities from interpretability research, and your statement introduces Anthropic as “Anthropic, which is currently the biggest publisher of interp-research.” Some readers might conclude corollaries like “Anthropic’s interpretability team doesn’t care about advancing capabilities.”
  - habryka 26 Apr 2023 22:51 UTC
    2 points
    0
    Parent
    Makes sense, sorry for the confusion.