Interesting! Can you give a bit more detail or share code?
It is based on this. I changed it to optimize using softmax instead of straight-through estimation and added regularization for the embedded tokens.
Notebook link—this is a version that mimics this post instead of optimizing a single neuron as in the original.
EDIT: github link
Interesting! Can you give a bit more detail or share code?
It is based on this. I changed it to optimize using softmax instead of straight-through estimation and added regularization for the embedded tokens.
Notebook link—this is a version that mimics this post instead of optimizing a single neuron as in the original.
EDIT: github link