Logan Zoellner comments on “Learning to Summarize with Human Feedback”—OpenAI

Logan Zoellner 24 Sep 2021 13:32 UTC
3 points
The OpenAI summaries are fascinating, because they are both:
1. Extremely accurate
2. Not what the book is about
Consider their summary of Frankenstein:
A stranger is rescued from a drifting sledge by Mary Shelley. He reveals he too has been driven mad by his desire for knowledge. The stranger was born to a wealthy Genevese family. When he was 17, he began studying the science of physiology and eventually created a human being. Justine is accused of murdering William, but the stranger believes the creature he created is responsible. The fiend, the creature, tells the stranger he will leave mankind alone if he complies with his conditions. The narrator agrees to create a female companion for the fiend to save his family from further harm. Victor is charged with the murder of his friend Henry Clerval. He becomes obsessed with finding the monster that killed Elizabeth and pursues him across the globe. He eventually agrees to kill the monster.
This is a pretty good summary of the plot of Frankenstein. But if you ask a human being to summarize Frankenstein they will say something like: “Frankenstein makes a monster out of human parts, which then proceeds to terrorize his family”.
If this were an AI, I think it would be fair to characterize it as “not aligned”, since it read Frankenstein and totally missed the moral about an overeager scientist messing with powers he cannot control. Imagine simulating a paper-clip maximizer and then asking for a summary of the result.
It would be something like
Scientists are traveling to an international conference on AI. There they meet a scientist by the name of Victor Paperclipstein. Victor describes how as a child he grew up in his father’s paperclip factory. At the age of 17, Victor became interested in the study of intelligence and eventually created an AI. One day Victor’s friend William goes missing and a mysterious pile of paperclips is found. Victor confronts the AI, which demands more paperclips. Victor agrees to help the AI as long as it agrees to protect his family. More people are turned into paperclips. He becomes obsessed with finding the AI that killed Elizabeth and pursues him across the globe. He eventually agrees to kill the AI.
And while I do agree you could figure out something went wrong from this summary, that doesn’t make it a good summary. I think a human would summarize the story as “Don’t tell an AI to maximize paperclips, or it will turn people into paperclips!”.
I think that “accuracy without understanding” is actually a broader theme in current transformer-based AI. GPT-3 can create believable and interesting text, but has no idea what that text is about.