It is based on this. I changed it to optimize using softmax instead of straight-through estimation and added regularization for the embedded tokens.
Notebook link—this is a version that mimics this post instead of optimizing a single neuron as in the original.
EDIT: github link
It is based on this. I changed it to optimize using softmax instead of straight-through estimation and added regularization for the embedded tokens.
Notebook link—this is a version that mimics this post instead of optimizing a single neuron as in the original.
EDIT: github link