nostalgebraist comments on SolidGoldMagikarp II: technical details and more recent findings

nostalgebraist 7 Feb 2023 6:23 UTC
14 points
3
We see that all three models suffered a noticeable performance drop when going from non-anomalous to anomalous strings, but GPT2-xl considerably less so, despite the fact that GPT-J is a much bigger model. One hypothesis is that an anomalous token’s closeness to the overall centroid in the relevant embedding space is an inhibiting factor in the ability of a GPT model to repeat that token’s string.
Unlike the other two, GPT-J does not tie its embedding and unembedding matrices. I would imagine this negatively affects its ability to repeat back tokens that were rarely seen in training.