Gurkenglas comments on interpreting GPT: the logit lens

Gurkenglas 1 May 2021 8:53 UTC
2 points
floating point underflow to simulate relu
Oh that’s not good. Looks like we’d need a version of float that keeps track of an interval of possible floats (by the two floats at the end of the interval). Then we could simulate the behavior of infinite-precision floats so long as the network keeps the bounds tight, and we could train the network to keep the simulation in working order. Then we could see whether, in a network thus linear at small numbers, every visibly large effect has a visibly large cause.
By the way—have you seen what happens when you finetune GPT to reinforce this pattern that you’re observing, that every entry of the table, not just the top right one, predicts an input token?