There are a few interpretability ideas on aisafetyideas.com, e.g. mechanistic interpretability list, blackbox investigations list, and automating auditing list.
There are a few interpretability ideas on aisafetyideas.com, e.g. mechanistic interpretability list, blackbox investigations list, and automating auditing list.