[word] and [word] can be thought of as “the previous token is ′ and’.”
It might just be one of a family of linear features or ?? aspect of some other representation ?? corresponding to what the previous token is, to be used for at least induction head.
Maybe the reason you found ′ and’ first is because ′ and’ is an especially frequent word. If you train on the normal document distribution, you’ll find the most frequent features first.
[word] and [word] can be thought of as “the previous token is ′ and’.”
I think it’s mostly this, but looking at the ablated text, removing the previous word before and does have a significant effect some of the time. I’m less confident on the specifics of why the previous word matter or in what contexts.
Maybe the reason you found ′ and’ first is because ′ and’ is an especially frequent word. If you train on the normal document distribution, you’ll find the most frequent features first.
This is a database method, so I do believe we’d find the features most frequently present in that dataset, plus the most important for reconstruction. An example of the latter: the highest MCS feature across many layers & model sizes is the “beginning & end of first sentence” feature which appears to line up w/ the emergent outlier dimensions from Tim Dettmer’s post here, but I do need to do more work to actually show that.
[word] and [word]
can be thought of as “the previous token is ′ and’.”
It might just be one of a family of linear features or ?? aspect of some other representation ?? corresponding to what the previous token is, to be used for at least induction head.
Maybe the reason you found ′ and’ first is because ′ and’ is an especially frequent word. If you train on the normal document distribution, you’ll find the most frequent features first.
I think it’s mostly this, but looking at the ablated text, removing the previous word before and does have a significant effect some of the time. I’m less confident on the specifics of why the previous word matter or in what contexts.
This is a database method, so I do believe we’d find the features most frequently present in that dataset, plus the most important for reconstruction. An example of the latter: the highest MCS feature across many layers & model sizes is the “beginning & end of first sentence” feature which appears to line up w/ the emergent outlier dimensions from Tim Dettmer’s post here, but I do need to do more work to actually show that.