It’s always hard to say whether this is an alignment or capabilities problem. It’s also too contrived too offer much signal.
The overall vibe is these LLMs grasp most of our values pretty well. They give common sense answers to most moral questions. You can see them grasp Chinese values pretty well too, so n=2. It’s hard to characterize this as mostly “terrible”.
This shouldn’t be too surprising in retrospect. Our values are simple for LLMs to learn. It’s not going to disassemble cows for atoms to end racism.There are edge cases where it’s too woke, but these got quickly fixed. I don’t expect them to ever pop up again.
It’s always hard to say whether this is an alignment or capabilities problem. It’s also too contrived too offer much signal.
The overall vibe is these LLMs grasp most of our values pretty well. They give common sense answers to most moral questions. You can see them grasp Chinese values pretty well too, so n=2. It’s hard to characterize this as mostly “terrible”.
This shouldn’t be too surprising in retrospect. Our values are simple for LLMs to learn. It’s not going to disassemble cows for atoms to end racism.There are edge cases where it’s too woke, but these got quickly fixed. I don’t expect them to ever pop up again.