What if the incorrect spellings document assigned each token to a specific (sometimes) wrong answer and used that to form an incorrect word spelling? Would that be more likely to successfully confuse the LLM?
The letter x is in “berry” 0 times.
...
The letter x is in “running” 0 times.
...
The letter x is in “str” 1 time.
...
The letter x is in “string” 1 time.
...
The letter x is in “strawberry” 1 time.
Just found out about this paper from about a year ago: “Explainability for Large Language Models: A Survey”
(They “use explainability and interpretability interchangeably.”)
It “aims to comprehensively organize recent research progress on interpreting complex language models”.
I’ll post anything interesting I find from the paper as I read.
Have any of you read it? What are your thoughts?