Joseph Van Name comments on ‘Fundamental’ vs ‘applied’ mechanistic interpretability research

Joseph Van Name 25 May 2023 11:43 UTC
1 point
0
Set a random variable $X_{A}$ to be a trained model with bilinear layers with random initialization and training data $A$ . Then I would like to know if various estimated upper bounds for various entropies for $X_{A}$ are much lower than if $X_{A}$ were a more typical machine learning model where a linear layer is composed with ReLU. It seems like entropy is a good objective measure of the lack of decipherability.