Glitch tokens make for fascinating reading, but I think the technical explanation doesn’t leave too much mystery on the table. I think where those tokens end up in concept space is basically random and therefore extreme.
To really study them more closely, I think it makes sense to use Llama 65B or OPT 175B. There you would have full control over the vector embedding and you could input random embeddings and semi-random embeddings and study which parts of the concept space leads to which behaviours.
Glitch tokens make for fascinating reading, but I think the technical explanation doesn’t leave too much mystery on the table. I think where those tokens end up in concept space is basically random and therefore extreme.
To really study them more closely, I think it makes sense to use Llama 65B or OPT 175B. There you would have full control over the vector embedding and you could input random embeddings and semi-random embeddings and study which parts of the concept space leads to which behaviours.