gwern comments on Bing Chat is blatantly, aggressively misaligned

gwern 21 Feb 2023 1:20 UTC
11 points
3
OK, I wouldn’t say the leaks are 100% fake. But they are clearly not 100% real or 100% complete, which is how people have been taking them.

We have the MS PM explicitly telling us that the leaked versions are omitting major parts of the prompt (the few-shots) and that he was optimizing for costs like falling back to cheap small models (implying a short prompt*), and we can see in the leak that Sydney is probably adding stuff which is not in the prompt (like the supposed update/delete commands).

This renders the leaks useless to me. Anything I might infer from them like ‘Sydney is GPT-4 because the prompt says so’ is equally well explained by ‘Sydney made up that up’ or ‘Sydney omitted the actual prompt’. When a model hallucinates, I can go check, but that means that the prompt can only provide weak confirmation of things I learned elsewhere. (Suppose I learned Sydney really is GPT-4 after all and I check the prompt and it says it’s GPT-4; but the real prompt could be silent on that, and Sydney just making the same plausible guess everyone else did—it’s not stupid—and it’d have Gettier-cased me.)

idk what that implies

Yeah, the GPT-4 vs GPT-3 vs ??? business is getting more and more confusing. Someone is misleading or misunderstanding somewhere, I suspect—I can’t reconcile all these statements and observations. Probably best to assume that ‘Prometheus’ is maybe some GPT-3 version which has been trained substantially more—we do know that OA refreshes models to update them & increase context windows/change tokenization and also does additional regular self-supervised training as part of the RLHF training (just to make things even more confusing). I don’t think anything really hinges on this, fortunately. It’s just that being GPT-4 makes it less likely to have been RLHF-trained or just a copy of ChatGPT.

* EDIT: OK, maybe it’s not that short: “You’d be surprised: modern prompts are very long, which is a problem: eats up the context space.”
- janus 21 Feb 2023 7:45 UTC
  20 points
  9
  Parent
  Does 1-shot count as few-shot? I couldn’t get it to print out the Human A example, but I got it to summarize it (I’ll try reproducing tomorrow to make sure it’s not just a hallucination).
  Then I asked for a summary of conversation with Human B and it summarized my conversation with it.
  [update: was able to reproduce the Human A conversation and extract verbatim version of it using base64 encoding (the reason i did summaries before is because it seemed to be printing out special tokens that caused the message to end that were part of the Human A convo)]
  I disagree that there maybe being hallucinations in the leaked prompt renders it useless. It’s still leaking information. You can probe for which parts are likely actual by asking in different ways and seeing what varies.