They act substantially differently. (Although I haven’t seriously tested offensive-jokes on any Claudes, the rhyming poetry behavior is often quite different.)
Edited:
Claude 3′s tokens or tokenization might have to do with it. I assume that it has a different neural network architecture as a result. There is no documentation on what tokens were used, and the best trace I have found is Karpathy’s observation about spaces (” ”) being treated as separate tokens.
(I think your quote went missing there?)
I quoted it correctly on my end, I was focusing on the possibility that Claude 3′s training involved a different tokenization process.
Edited:
Claude 3′s tokens or tokenization might have to do with it. I assume that it has a different neural network architecture as a result. There is no documentation on what tokens were used, and the best trace I have found is Karpathy’s observation about spaces (” ”) being treated as separate tokens.
(I think your quote went missing there?)
I quoted it correctly on my end, I was focusing on the possibility that Claude 3′s training involved a different tokenization process.