@Neel Nanda but isn’t it that there are more layers that produce the actual vocab tokens? it’s not only layer 0 or the embedding layer right?
@Neel Nanda but isn’t it that there are more layers that produce the actual vocab tokens? it’s not only layer 0 or the embedding layer right?