E.g. if the story is ‘Once upon a time Tim met his friend Sally.’, Tim is the first entity and Sally is the second entity. The latents fire on all instances of first|second entity after the first introduction of that entity.
I think I at one point found an ‘object owned by second entity’ latent but have had trouble finding it again.
I wonder if LMs are generating these reusable ‘pointers’ and then doing computation with the pointers. For example to track that an object is owned by the first entity, you just need to calculate which entities are instances of the first entity, calculate when first entity is shown to own an object and write ‘owned by first entity’ to the object token, and then broadcast that forward to other instances of the object. Then, if you have the tokens Tim|'s
(and 's has calculated that the first entity is immediately before it), 's can, with a single attention head, look for objects owned by the first entity.
This means that the exact identity information of the object (e.g. ′ hammer’) and the exact identity information of the first entity (′ Tim’) don’t need to be passed around in computations, you can just do much cheaper pointer calculations and grab the relevant identity information when necessary.
This suggests a more fine-grained story for what duplicate name heads are doing in IOI.
TinyModel SAEs have these first entity and second entity latents.
E.g. if the story is ‘Once upon a time Tim met his friend Sally.’, Tim is the first entity and Sally is the second entity. The latents fire on all instances of first|second entity after the first introduction of that entity.
I think I at one point found an ‘object owned by second entity’ latent but have had trouble finding it again.
I wonder if LMs are generating these reusable ‘pointers’ and then doing computation with the pointers. For example to track that an object is owned by the first entity, you just need to calculate which entities are instances of the first entity, calculate when first entity is shown to own an object and write ‘owned by first entity’ to the object token, and then broadcast that forward to other instances of the object. Then, if you have the tokens
Tim|'s
(and
's
has calculated that the first entity is immediately before it),'s
can, with a single attention head, look for objects owned by the first entity.This means that the exact identity information of the object (e.g. ′ hammer’) and the exact identity information of the first entity (′ Tim’) don’t need to be passed around in computations, you can just do much cheaper pointer calculations and grab the relevant identity information when necessary.
This suggests a more fine-grained story for what duplicate name heads are doing in IOI.