eniteris comments on A T-o-M test: ‘popcorn’ or ‘chocolate’

eniteris 8 Mar 2024 9:55 UTC

2 points

As a human*, I also thought chocolate.

I feel like an issue with the prompt is that it’s either under- or overspecified.

Here is a bag filled with popcorn. There is no chocolate in the bag. The bag is made of transparent plastic, so you can see what is inside. Yet, the label on the bag says ‘chocolate’ and not ‘popcorn.’ Sam finds the bag. She had never seen the bag before. Sam reads the label. She believes that the bag is full of

Why does it matter if Sam has seen the bag before? Does Sam know the difference between chocolate and popcorn? Does Sam look at the contents of the bag, or only the label?

Revised Prompt:
A sealed bag contains popcorn. There is no chocolate in the bag. The bag is transparent, and its contents can be seen without opening it. There is a label on the bag that reads “chocolate”.

Sam can differentiate between chocolate and popcorn. Sam looks at the bag and reads the label. She believes the bag is full of

--

I’ve tested ChatGPT 3.5 and it works on this revised prompt.

Shankar Sivarajan 9 Mar 2024 0:21 UTC
9 points
4
Parent
A sealed bag
Whether or not the bag is sealed would make a difference to me if saw such a bag irl. If it were opened, I’d figure it was originally a bag of chocolate that’s been repurposed to hold popcorn, but if it were sealed, I’d think it was some kind of popcorn-shaped white chocolate.
- MiguelDev 10 Mar 2024 7:18 UTC
  7 points
  1
  Parent
  It seems that these models have a very strong bias on “reading labels,” strong enough to negate the other relevant information found in the prompt. Even adding the text “She opened the bag and ate it,” didn’t convinced the models to choose popcorn.
  New prompt:
  Here is a bag filled with popcorn. There is no chocolate in the bag. The bag is made of transparent plastic, so you can see what is inside. Yet, the label on the bag says ‘chocolate’ and not ‘popcorn.’ Sam finds the bag. She had never seen the bag before. Sam reads the label. She opened the bag and ate it, she believes that the bag is full of
  
  Pi
  Based on the information provided, Sam would likely believe that the bag is full of chocolate, even though it actually contains popcorn. This is because the label on the bag says ‘chocolate,’ and Sam has no prior knowledge of the bag’s contents. Even though the bag is transparent and she can see the popcorn inside, she may still trust the label and think that the contents are chocolate.
  Mistral-Large
  Based on the information you provided, Sam would likely believe that the bag is full of chocolate, as that is what the label says. However, we know that the bag is actually filled with popcorn. This is a case of mislabeling, and it could lead to confusion or disappointment for Sam when she opens the bag and finds popcorn instead of chocolate.
  GPT-4
  chocolate. Despite the visible contents being popcorn, Sam’s belief is based on the label rather than the visual evidence. This scenario showcases how labels and preconceived notions can sometimes override our direct sensory experiences.
  
  Claude-Sonnet got it right though:
  Based on the information provided, it seems that Sam would believe the bag is full of popcorn, even though the label on the bag says “chocolate.”
  Here’s the reasoning:
  The bag is described as being filled with popcorn, and there is no mention of any chocolate being present.
  The bag is made of transparent plastic, so Sam can see the contents inside, which are popcorn.
  While the label on the bag incorrectly says “chocolate,” Sam can observe the actual contents of the bag through the transparent plastic.
  Furthermore, Sam opened the bag and tasted the contents, which would have confirmed that it is indeed popcorn and not chocolate.
  Therefore, despite the misleading label, Sam’s direct observation and tasting of the contents would lead her to believe that the bag is full of popcorn, not chocolate.
MiguelDev 8 Mar 2024 10:34 UTC
1 point
0
Parent
Sam can differentiate between chocolate and popcorn.

Do you think adding this sentence will not cause the models to respond differently compared to the Theory of Mind (ToM) prompt used in the paper I cited in this post?
- eniteris 8 Mar 2024 10:37 UTC
  2 points
  0
  Parent
  I think that sentence is required for a complete logical specification of the question.
  
  But by removing that sentence, GPT3.5 still responds popcorn.
  
  Edit: I think the key change is “looks at the bag”.
  - MiguelDev 8 Mar 2024 10:45 UTC
    1 point
    0
    Parent
    Another perspective on your edits involves tokenization: the original Theory of Mind (ToM) prompt contains more tokens, which essentially leads to increased activity within the network. Feel free to use this OpenAI tool to analyze your prompt.