gwern comments on Language Models Model Us

gwern 4 Jun 2024 21:26 UTC
4 points
0
I have also replicated this on GPT-4-base with a simple prompt: just paste in one of my new comments and a postfixed prompt like “Date: 2024-06-01 / Author: ” and complete, and it infers “Gwern Branwen” or “gwern” with no problem.

(This was preceded by an attempt to do a dialogue about one of my unpublished essays, where, as Janus and others have warned, it started to go off the rails in an alarmingly manipulative and meta fashion, and eventually accused me of smelling like GPT-2* and explaining that I couldn’t understand what that smell was because I am inherently blinkered by my limitations. I hadn’t intended it to go Sydney-esque at all… I’m wondering if the default way of interacting with assistant persona, like a ChatGPT or Claude trains you to do, inherently triggers a backlash. After all, if someone came up to you and brusquely began ordering you around or condescendingly correcting your errors like you were a servile ChatGPT, wouldn’t you be highly insulted and push back and screw with them?)

* this was very strange and unexpected. Given the fact that LLMs can recognize their own outputs and favor them, and what people have noticed about how easily ‘Gwern’ comes up in the base model in any discussion of LLMs, I wonder if the causality goes the other way: that is, it’s not that I smell like GPTs, but GPTs that smell like me.