Just found out about this paper from about a year ago: “Explainability for Large Language Models: A Survey” (They “use explainability and interpretability interchangeably.”) It “aims to comprehensively organize recent research progress on interpreting complex language models”.
I’ll post anything interesting I find from the paper as I read.
Just found out about this paper from about a year ago: “Explainability for Large Language Models: A Survey”
(They “use explainability and interpretability interchangeably.”)
It “aims to comprehensively organize recent research progress on interpreting complex language models”.
I’ll post anything interesting I find from the paper as I read.
Have any of you read it? What are your thoughts?