I spent a few months in late 2021/early 2022 learning about various alignment research directions and trying to evaluate them. Quintin’s thoughtful comparison between interpretability and 1960s neuroscience in this post convinced me of the strong potential for interpretability research more than I think anything else I encountered at that time.
I spent a few months in late 2021/early 2022 learning about various alignment research directions and trying to evaluate them. Quintin’s thoughtful comparison between interpretability and 1960s neuroscience in this post convinced me of the strong potential for interpretability research more than I think anything else I encountered at that time.