This is great work. My recommendation: add a term in your loss function that penalizes features with high cosine similarity.
I think there is a strong theoretical underpinning for the results you are seeing.
I might try to reach out directly—some of my own academic work is directly relevant here.
Interesting! I actually did a small experiment with this a while ago, but never really followed up on it.I would be interested to hear about your theoretical work in this space, so sent you a DM :)
This is great work. My recommendation: add a term in your loss function that penalizes features with high cosine similarity.
I think there is a strong theoretical underpinning for the results you are seeing.
I might try to reach out directly—some of my own academic work is directly relevant here.
Interesting! I actually did a small experiment with this a while ago, but never really followed up on it.
I would be interested to hear about your theoretical work in this space, so sent you a DM :)