Is this consistent with the interpretation of self-attention as approximating (large) steps in a Hopfield network?
Is this consistent with the interpretation of self-attention as approximating (large) steps in a Hopfield network?