Bilinear layers—not confident at all! It might make structure more amenable to mathematical analysis so it might help? But as yet there aren’t any empirical interpretability wins that have come from bilinear layers.
Dictionary learning—This is one of my main bets for comprehensive interpretability.
Bilinear layers—not confident at all! It might make structure more amenable to mathematical analysis so it might help? But as yet there aren’t any empirical interpretability wins that have come from bilinear layers.
Dictionary learning—This is one of my main bets for comprehensive interpretability.
Other areas—I’m also generally excited by the line of research outlined in https://arxiv.org/abs/2301.04709