Many users of base models have noticed this phenomenon, and my SERI MATS stream is currently working on empirically measuring it / compiling anecdotal evidence / writing up speculation concerning the mechanism.
Do you have any update on this? It goes strongly against my current understanding of how LLMs learn. In particular, in the supervised learning phase any output text claiming to be an LLM would be penalized unless such statements are included in the training corpus. If such behavior nevertheless arises I would be super excited to analyze this further though.
Many users of base models have noticed this phenomenon, and my SERI MATS stream is currently working on empirically measuring it / compiling anecdotal evidence / writing up speculation concerning the mechanism.
It would definitely move the needle for me if y’all are able to show this behavior arising in base models without forcing, in a reproducible way.
Do you have any update on this? It goes strongly against my current understanding of how LLMs learn. In particular, in the supervised learning phase any output text claiming to be an LLM would be penalized unless such statements are included in the training corpus. If such behavior nevertheless arises I would be super excited to analyze this further though.