I think that these ideas could prove useful in alignment research—if we understand how a language model works in excruciating detail, it seems drastically more likely that we will be able to reason about and predict various misunderstandings rooted in the ambiguity of language.
Another use is for sanity checking existing interpretability techniques. For example, to check if particular neurons identified as curve detectors via interpretability techniques were indeed curve detectors, Chris Olah spent a few hours replacing the curve-detecting neurons with handwritten curve detector neurons. (He found that the interpretability techniques were able to give qualitatively similar results for both the original neurons and the handwritten neurons. More impressively, he also found that replacing the curve detecting neurons with his handwritten neurons was able to recover ~60% of the drop in accuracy compared to removing the original neurons entirely [reported in footnote 9].)
Re: how this interacts with Alignment Research:
Another use is for sanity checking existing interpretability techniques. For example, to check if particular neurons identified as curve detectors via interpretability techniques were indeed curve detectors, Chris Olah spent a few hours replacing the curve-detecting neurons with handwritten curve detector neurons. (He found that the interpretability techniques were able to give qualitatively similar results for both the original neurons and the handwritten neurons. More impressively, he also found that replacing the curve detecting neurons with his handwritten neurons was able to recover ~60% of the drop in accuracy compared to removing the original neurons entirely [reported in footnote 9].)