To check this, you’d want to look at a model trained with untied embeddings. Sadly, all the ones I’m aware of (Eleuther’s Pythia, and my interpretability friendly models) were trained on the GPT-NeoX tokenizer or variants, whcih doesn’t seem to have stupid tokens in the same way.
GPT-J uses the GPT-2 tokenizer and has untied embeddings.
GPT-J uses the GPT-2 tokenizer and has untied embeddings.