I think the upvotes, without answers, means that other people are also interested in hearing Nate’s clarifications on these questions, particularly #1.
2 is a mixture of both—examples will hopefully come as people comment their disagreements.
Ambitiousness in interpretability can look like greater generalization to never-before-seen architectures, especially automated generalization that doesn’t strictly need human intervention. It can also look like robustly being able to use interpretability tools to provide oversight to training, e.g. as “thought assessors.” I bet people more focused on interpretability have more ideas.
(Most of the QR-upvotes at the moment are from me. I think 1-4 are all good questions, for Nate or others; but I’m extra excited about people coming up with ideas for 3.)
I think the upvotes, without answers, means that other people are also interested in hearing Nate’s clarifications on these questions, particularly #1.
2 is a mixture of both—examples will hopefully come as people comment their disagreements.
Ambitiousness in interpretability can look like greater generalization to never-before-seen architectures, especially automated generalization that doesn’t strictly need human intervention. It can also look like robustly being able to use interpretability tools to provide oversight to training, e.g. as “thought assessors.” I bet people more focused on interpretability have more ideas.
(Most of the QR-upvotes at the moment are from me. I think 1-4 are all good questions, for Nate or others; but I’m extra excited about people coming up with ideas for 3.)