This was a dig at interpretability research. I’m pro-interpretability research in general, so if you feel personally attacked by this, it wasn’t meant to be too serious. Just be careful with infohazards, ok? :)
This was a dig at interpretability research. I’m pro-interpretability research in general, so if you feel personally attacked by this, it wasn’t meant to be too serious. Just be careful with infohazards, ok? :)