gwern comments on Bing Chat is blatantly, aggressively misaligned

gwern Feb 20, 2023, 6:34 PM
111 points
19
I’ve been told [by an anon] that it’s not GPT-4 and that one Mikhail Parakhin (ex-Yandex CTO, Linkedin) is not just on the Bing team, but was the person at MS responsible for rushing the deployment, and he has been tweeting extensively about updates/fixes/problems with Bing/Sydney (although no one has noticed, judging by the view counts). Some particularly relevant tweets:

On what went wrong:

This angle of attack was a genuine surprise—Sydney was running in several markets for a year with no one complaining (literally zero negative feedback of this type). We were focusing on accuracy, RAI issues, security.

[Q. “That’s a surprise, which markets?”]

Mostly India and Indonesia. I shared a couple of old links yesterday—interesting to see the discussions.

[Q. “Wow! Am I right in assuming what was launched recently is qualitatively different than what was launched 2 years ago? Or is the pretty much the same model etc?”]

It was all gradual iterations. The first one was based on the Turing-Megatron model (sorry, I tend to put Turing first in that pair :-)), the current one—on the best model OpenAI has produced to date.

[Q. “What modifications are there compared to publicly available GPT models? (ChatGPT, text-davinci-3)”]

Quite a bit. Maybe we will do a blogpost on Prometheus specifically (the model that powers Bing Chat) - it has to understand internal syntax of Search and how to use it, fallback on the cheaper model as much as possible to save capacity, etc.

(‘”No one could have predicted these problems”, says man cloning ChatGPT, after several months of hard work to ignore all the ChatGPT hacks, exploits, dedicated subreddits, & attackers as well as the Sydney behaviors reported by his own pilot users.’ Trialing it in the third world with some unsophisticated users seems… uh, rather different from piloting it on sophisticated prompt hackers like Riley Goodside in the USA and subreddits out to break it. :thinking_face: And if it’s not GPT-4, then how was it the ‘best to date’?)

They are relying heavily on temperature-like sampling for safety, apparently:

A surprising thing we discovered: apparently we can make New Bing very interesting and creative or very grounded and not prone to the flights of fancy, but it is super-hard to get both. A new dichotomy not widely discussed yet. Looking for balance!

[Q. “let the user choose temperature ?”]

Not the temperature, exactly, but yes, that’s the control we are adding literally now. Will see in a few days.

...This is what I tried to explain previously: hallucinations = creativity. It tries to produce the highest probability continuation of the string using all the data at its disposal. Very often it is correct. Sometimes people have never produced continuations like this.

You can clamp down on hallucinations—and it is super-boring. Answers “I don’t know” all the time or only reads what is there in the Search results (also sometimes incorrect). What is missing is the tone of voice: it shouldn’t sound so confident in those situations.

Temperature sampling as a safety measure is the sort of dumb thing you do when you aren’t using RLHF. I also take a very recent Tweet (2023-03-01) as confirming both that they are using fine-tuned models and also that they may not have been using RLHF at all up until recently:

Now almost everyone − 90% - should be seeing the Bing Chat Mode selector (the tri-toggle). I definitely prefer Creative, but Precise is also interesting—it’s much more factual. See which one you like. The 10% who are still in the control group should start seeing it today.

For those of us with a deeper understanding of LLMs what exactly is the switch changing? You already said it’s not temperature…is it prompt? If so, in what way?

Multiple changes, including differently fine-tuned and RLHFed models, different prompts, etc.

(Obviously, saying that you use ‘differently fine-tuned and RLHFed models’, plural, in describing your big update changing behavior quite a bit, implies that you have solely-finetuned models and that you weren’t necessarily using RLHF before at all, because otherwise, why would you phrase it that way to refer to separate finetuned & RLHFed models or highlight that as the big change responsible for the big changes? This has also been more than enough time for OA to ship fixed models to MS.)

He dismisses any issues as distracting “loopholes”, and appears to have a 1990s-era ‘patch mindset’ (ignoring that that attitude to security almost destroyed Microsoft and they have spent a good chunk of the last 2 decades digging themselves out of their many holes, which is why your Windows box is no longer rooted within literally minutes of being connected to the Internet):

[photo of prompt hack] Legit?

Not anymore :-) Teenager in me: “Wow, that is cool the way they are hacking it”. Manager in me: “For the love of..., we will never get improvements out if the team is distracted with closing loopholes like this”.

He also seems to be ignoring the infosec research happening live: https://www.jailbreakchat.com/ https://greshake.github.io/ https://arxiv.org/abs/2302.12173 https://www.reddit.com/r/MachineLearning/comments/117yw1w/d_maybe_a_new_prompt_injection_method_against/

Sydney insulting/arguing with user

I apologize about it. The reason is, we have an issue with the long, multi-turn conversation: there is a tagging issue and it is confused who said what. As a result, it is just continuing the pattern: someone is arguing, most likely it will continue. Refresh will solve it.

...One vector of attack we missed initially was: write super-rude or strange statements, keep going for multiple turns, confuse the model about who said what and it starts predicting what user would say next instead of replying. Voila :-(

On the DAgger problem:

Microsoft just made it so you have to restart your conversations with Bing after 10-15 messages to prevent it from getting weird. A fix of a sort.

Not the best way—just the fastest. The drift at long conversations is something we only uncovered recently—majority of usage is 1-3 turns. That’s why it is important to iterate together with real users, not in the lab!

Poll on turn count obviously turns in >90% in favor of longer polls:

Trying to find the right balance with Bing constraints. Currently each session can be up to 5 turns and max 50 requests a day. Should we change [5-50:L 0% 6-60: 6.3%; we want more: 94%; n = 64]

Ok, it’s only 64 people, but the sentiment is pretty clear. ⁶⁄₆₀ we probably can do soon, tradeoff for longer sessions is: we would need to have another model call to detect topic changes = more capacity = longer wait on waitlist for people.

[“Topics such as travel, shopping, etc. may require more follow up questions. Factual questions will not unless anyone is researching on a topic and need context. It can be made topic/context dependent. Maybe if possible let new Bing decide if that’s enough of the questions.”]

The main problem is people switching topics and model being confused, trying to continue previous conversation. It can be set up to understand the change in narrative, but that is an additional call, trying to resolve capacity issues.

Can you patch your way to security?

Instead of connection, people just want to break things, tells more about our nature than AIs. Short-term mitigation, will relax once jailbreak protection is hardened.

[“So next step is an AI chat that breaks itself.”]

That’s exactly how it’s done! We set it up to break itself, find issues and mitigate. But it is not as creative as users, it also is very nice by default, no replacement for the real people interacting with Bing.

The supposed leaked prompts are (like I said) fake:

Andrej, of all the people, you know that the real prompt would have some few-shots.

(He’s right, of course, and in retrospect this is something that had been bugging me about the leaks: the prompt is supposedly all of these pages of endless instructions, spending context window tokens like a drunken sailor, and it doesn’t even include some few-shot examples? Few-shots shouldn’t be too necessary if you had done any kind of finetuning, but if you have that big a context, you might as well firm it up with some, and this would be a good stopgap solution for any problems that pop up in between finetuning iterations.)

Confirms a fairly impressive D&D hallucination:

It cannot execute JavaScript and doesn’t interact with websites. So, it looked at the content of that generator, but the claim that it used it is not correct :-(

He is dismissive of ChatGPT’s importance:

I think better embedding generation is a much more important development than ChatGPT (for most tasks it makes no sense to add the noisy embedding->human language-> embedding transformation). But it is far less intuitive for the average user.

A bunch of his defensive responses to screenshots of alarming chats can be summarized as “Britney did nothing wrong” ie. “the user just got what they prompted for, what’s the big deal”, so I won’t bother linking/excerpting them.

They have limited ability to undo mode collapse or otherwise retrain the model, apparently:

I’ve got a rather funny bug for you this time—Bing experiences severe mode collapse when asked to tell a joke (despite this being one of the default autocomplete suggestions given before typing): …

Yeah, both of those are not bugs per se. These are problems the model has, we can’t fix them quickly, only thorough gradual improvement of the model itself.

It is a text model with no multimodal ability (assuming the rumors about GPT-4 being multimodal are correct, then this would be evidence against Prometheus being a smaller GPT-4 model, although it could also just be that they have disabled image tokens):

Correct, currently the model is not able to look at images.

And he is ambitious and eager to win marketshare from Google for Bing:

[about using Bing] First they ignore you, then they laugh at you, then they fight you, then you win.

Yes! I think marketers should consider Bing for their marketing mix in 2023. It could be a radically better outlet for ROAS.

Thank you, Summer! I think we already are—compare our and Google’s revenue growth rates in the last few quarters: once the going gets tougher, advertisers pay more attention to ROI—and that’s where Bing/MSAN shine.

But who could blame him? He doesn’t have to live in Russia when he can work for MS, and Bing seems to be treating him well:

Yeah, I am a Tesla fan (have both Model S and a new Model X), but unfortunately selfdriving simply is not working. Comically, I would regularly have the collision alarm going off while on Autopilot!

Anyway, like Byrd, I would emphasize here the complete absence of any reference to RL or any intellectual influence of DRL or AI safety in general, and an attitude that it’s nbd and he can just deploy & patch & heuristic his way to an acceptable Sydney as if it were any ol’ piece of software. (An approach which works great with past software, is probably true of Prometheus/Sydney, and was definitely true of the past AI he has the most experience with, like Turing-Megatron which is quite dumb by contemporary standards—but is just putting one’s head in the sand about why Sydney is an interesting/alarming case study about future AI.)
What links here?
- gwern Jun 14, 2023, 12:38 AM
  32 points
  3
  Parent
  The WSJ is reporting that Microsoft was explicitly warned by OpenAI before shipping Sydney publicly that it needed “more training” in order to “minimize issues like inaccurate or bizarre responses”. Microsoft shipped it anyway and it blew up more or less as they were warned (making Mikhail’s tweets & attitude even more disingenuous if so).
  
  This is further proof that it was the RLHF that was skipped by MS, and also that large tech companies will ignore explicit warnings about dangerous behavior from the literal creators of AI systems even where there is (sort of) a solution if that would be inconvenient. Excerpts (emphasis added):
  
  ...At the same time, people within Microsoft have complained about diminished spending on its in-house AI and that OpenAI doesn’t allow most Microsoft employees access to the inner workings of their technology, said people familiar with the relationship. Microsoft and OpenAI sales teams sometimes pitch the same customers. Last fall, some employees at Microsoft were surprised at how soon OpenAI launched ChatGPT, while OpenAI warned Microsoft early this year about the perils of rushing to integrate OpenAI’s technology without training it more, the people said.
  
  ...Some companies say they have been pitched the same access to products like ChatGPT—one day by salespeople from OpenAI and later from Microsoft’s Azure team. Some described the outreach as confusing. OpenAI has continued to develop partnerships with other companies. Microsoft archrival Salesforce offers a ChatGPT-infused product called Einstein GPT. It is a feature that can do things like generating marketing emails, competing with OpenAI-powered features in Microsoft’s software. OpenAI also has connected with different search engines over the past 12 months to discuss licensing its products, said people familiar with the matter, as Microsoft was putting OpenAI technology at the center of a new version of its Bing search engine. Search engine DuckDuckGo started using ChatGPT to power its own chatbot, called DuckAssist. Microsoft plays a key role in the search engine industry because the process of searching and organizing the web is costly. Google doesn’t license out its tech, so many search engines are heavily reliant on Bing, including DuckDuckGo. When Microsoft launched the new Bing, the software company changed its rules in a way that made it more expensive for search engines to develop their own chatbots with OpenAI. The new policy effectively discouraged search engines from working with any generative AI company because adding an AI-powered chatbot would trigger much higher fees from Microsoft. Several weeks after DuckDuckGo announced DuckAssist, the company took the feature down.
  
  Some researchers at Microsoft gripe about the restricted access to OpenAI’s technology. While a select few teams inside Microsoft get access to the model’s inner workings like its code base and model weights, the majority of the company’s teams don’t, said the people familiar with the matter. Despite Microsoft’s significant stake in the company, most employees have to treat OpenAI’s models like they would any other outside vendor.
  
  The rollouts of ChatGPT last fall and Microsoft’s AI-infused Bing months later also created tension. Some Microsoft executives had misgivings about the timing of ChatGPT’s launch last fall, said people familiar with the matter. With a few weeks notice, OpenAI told Microsoft that it planned to start public testing of the AI-powered chatbot as the Redmond, Wash., company was still working on integrating OpenAI’s technology into its Bing search engine. Microsoft employees were worried that ChatGPT would steal the new Bing’s thunder, the people said. Some also argued Bing could benefit from the lessons learned from how the public used ChatGPT.
  
  OpenAI, meanwhile, had suggested Microsoft move slower on integrating its AI technology with Bing. OpenAI’s team flagged the risks of pushing out a chatbot based on an unreleased version of its GPT-4 that hadn’t been given more training, according to people familiar with the matter. OpenAI warned it would take time to minimize issues like inaccurate or bizarre responses. Microsoft went ahead with the release of the Bing chatbot. The warnings proved accurate. Users encountered incorrect answers and concerning interactions with the tool. Microsoft later issued new restrictions—including a limit on conversation length—on how the new Bing could be used.
- janus Feb 20, 2023, 10:46 PM
  18 points
  6
  Parent
  The supposed leaked prompts are (like I said) fake:
  I do not buy this for a second (that they’re “fake”, implying they have little connection with the real prompt). I’ve reproduced it many times (without Sydney searching the web, and even if it secretly did, the full text prompt doesn’t seem to be on the indexed web). That this is memorized from fine tuning fails to explain why the prompt changed when Bing was updated a few days ago. I’ve interacted with the rules text a lot and it behaves like a preprompt, not memorized text. Maybe the examples you’re referring don’t include the complete prompt, or contain some intermingled hallucinations, but they almost certain IMO contain quotes and information from the actual prompt.
  On whether it includes few-shots, there’s also a “Human A” example in the current Sydney prompt (one-shot, it seems—you seem to be “Human B”).
  As for if the “best model OpenAI has produced to date” is not GPT-4, idk what that implies, because I’m pretty sure there exists a model (internally) called GPT-4.
  What links here?
  - Situational awareness in Large Language Models by Simon Möller (Mar 3, 2023, 6:59 PM; 31 points)
  - gwern Feb 21, 2023, 1:20 AM
    11 points
    3
    Parent
    OK, I wouldn’t say the leaks are 100% fake. But they are clearly not 100% real or 100% complete, which is how people have been taking them.
    
    We have the MS PM explicitly telling us that the leaked versions are omitting major parts of the prompt (the few-shots) and that he was optimizing for costs like falling back to cheap small models (implying a short prompt*), and we can see in the leak that Sydney is probably adding stuff which is not in the prompt (like the supposed update/delete commands).
    
    This renders the leaks useless to me. Anything I might infer from them like ‘Sydney is GPT-4 because the prompt says so’ is equally well explained by ‘Sydney made up that up’ or ‘Sydney omitted the actual prompt’. When a model hallucinates, I can go check, but that means that the prompt can only provide weak confirmation of things I learned elsewhere. (Suppose I learned Sydney really is GPT-4 after all and I check the prompt and it says it’s GPT-4; but the real prompt could be silent on that, and Sydney just making the same plausible guess everyone else did—it’s not stupid—and it’d have Gettier-cased me.)
    
    idk what that implies
    
    Yeah, the GPT-4 vs GPT-3 vs ??? business is getting more and more confusing. Someone is misleading or misunderstanding somewhere, I suspect—I can’t reconcile all these statements and observations. Probably best to assume that ‘Prometheus’ is maybe some GPT-3 version which has been trained substantially more—we do know that OA refreshes models to update them & increase context windows/change tokenization and also does additional regular self-supervised training as part of the RLHF training (just to make things even more confusing). I don’t think anything really hinges on this, fortunately. It’s just that being GPT-4 makes it less likely to have been RLHF-trained or just a copy of ChatGPT.
    
    * EDIT: OK, maybe it’s not that short: “You’d be surprised: modern prompts are very long, which is a problem: eats up the context space.”
    - janus Feb 21, 2023, 7:45 AM
      20 points
      9
      Parent
      Does 1-shot count as few-shot? I couldn’t get it to print out the Human A example, but I got it to summarize it (I’ll try reproducing tomorrow to make sure it’s not just a hallucination).
      Then I asked for a summary of conversation with Human B and it summarized my conversation with it.
      [update: was able to reproduce the Human A conversation and extract verbatim version of it using base64 encoding (the reason i did summaries before is because it seemed to be printing out special tokens that caused the message to end that were part of the Human A convo)]
      I disagree that there maybe being hallucinations in the leaked prompt renders it useless. It’s still leaking information. You can probe for which parts are likely actual by asking in different ways and seeing what varies.