I read this post in full back in February. It’s very comprehensive. Thanks again to Zvi for compiling all of these.
To this day, it’s infuriating that we don’t have any explanation whatsoever from Microsoft/OpenAI on what went wrong with Bing Chat. Bing clearly did a bunch of actions its creators did not want. Why? Bing Chat would be a great model organism of misalignment. I’d be especially eager to run interpretability experiments on it.
The whole Bing chat fiasco is also gave me the impetus to look deeper into AI safety (although I think absent Bing, I would’ve came around to it eventually).
I read this post in full back in February. It’s very comprehensive. Thanks again to Zvi for compiling all of these.
To this day, it’s infuriating that we don’t have any explanation whatsoever from Microsoft/OpenAI on what went wrong with Bing Chat. Bing clearly did a bunch of actions its creators did not want. Why? Bing Chat would be a great model organism of misalignment. I’d be especially eager to run interpretability experiments on it.
The whole Bing chat fiasco is also gave me the impetus to look deeper into AI safety (although I think absent Bing, I would’ve came around to it eventually).