Hostile/threatening behavior is surely a far more serious misalignment from Microsoft’s perspective than anything else, no?
No. I’d expect the most serious misalignment from Microsoft’s perspective is a hallucination which someone believes, and which incurs material damage as a result, which Microsoft can then be sued over. Hostile language from the LLM is arguably a bad look in terms of PR, but not obviously particularly bad for the bottom line.
That said, if this was your reasoning behind including so many examples of hostile/threatening behavior, then from my perspective that at least explains-away the high proportion of examples which I think are easily misinterpreted.
No. I’d expect the most serious misalignment from Microsoft’s perspective is a hallucination which someone believes, and which incurs material damage as a result, which Microsoft can then be sued over. Hostile language from the LLM is arguably a bad look in terms of PR, but not obviously particularly bad for the bottom line.
Obviously we can always play the game of inventing new possible failure modes that would be worse and worse. The point, though, is that the hostile/threatening failure mode is quite bad and new relative to previous models like ChatGPT.
No. I’d expect the most serious misalignment from Microsoft’s perspective is a hallucination which someone believes, and which incurs material damage as a result, which Microsoft can then be sued over. Hostile language from the LLM is arguably a bad look in terms of PR, but not obviously particularly bad for the bottom line.
That said, if this was your reasoning behind including so many examples of hostile/threatening behavior, then from my perspective that at least explains-away the high proportion of examples which I think are easily misinterpreted.
Obviously we can always play the game of inventing new possible failure modes that would be worse and worse. The point, though, is that the hostile/threatening failure mode is quite bad and new relative to previous models like ChatGPT.
Why do you think these aren’t tightly correlated? I think PR is pretty important to the bottom line for a product in the rollout phase.