Dall-E and other image generation models can’t reliably get cardinalities right beyond something like 3.
That’s just BPEs, and it’s worth noting that the other models (and by ‘other’, I do not mean “Stable Diffusion” like a lazy person would, I mean actual SOTAs) are much more capable of following long intricate prompts eg. Parti.
What’s going on with ChatGPT is also likely due to BPEs (as has been the case with GPT when asked to do letter-related tasks going back to GPT-2, as I have been repeating for years) plus perhaps some RLHF shenanigans. (RLHF seems to increase accuracy but sometimes greatly worsen results. I’m not yet sure why, but I suspect there may be something pathological in what RLHF rewards, where it is unable to do inner-monologue and incentivized to immediately jump to an answer.)
BPEs explain the string examples but I also couldn’t get ChatGPT to accurately count words.
Edit:
An examples that I just now tried with GPT-3.5-turbo:
User: How many times does the word “cat” occur in the following sentence: A cat was a cat in a cat with a cat as a hat and picture of a cat on the cat hat.
Assistant: The word “cat” occurs 5 times in the sentence.
Because apparently I don’t have it in me to concentrate on work today, here is some more experimentation with ChatGPT:
Me: How many times does the word “cat” occur in the following sentence: A cat was a cat in a cat with a cat as a hat and picture of a cat on the cat hat.
ChatGPT: The word “cat” occurs five times in the sentence.
Me: Could you repeat the sentence but put a number after each occurence of “cat” counting up.
ChatGPT: Sure, here’s the sentence with a number after each occurrence of “cat” counting up:
“A cat (1) was a cat (2) in a cat (3) with a cat (4) as a hat and picture of a cat (5) on the cat hat.”
So I thought, maybe “a cat” is one token for some reason and I tried:
Me: Now, count the number of occurences of the word “permeable” in the following sentence: A permeable membrane is more permeable if it is more permeable rather than less permeable than a less permeable membrane is permeable.
ChatGPT: The word “permeable” appears four times in the sentence.
Me: Could you again repeat the sentence and put a number in brackets after each occurence of “permeable” counting up?
ChatGPT: Sure, here’s the sentence with a number in brackets after each occurrence of “permeable” counting up:
“A permeable membrane is more permeable if it is [1] more permeable rather than less permeable than a less permeable membrane is [2] permeable.”
“Permeable” appears twice in the first clause and twice in the second clause of the sentence, so there are four occurrences of the word “permeable” in total.
That’s just BPEs, and it’s worth noting that the other models (and by ‘other’, I do not mean “Stable Diffusion” like a lazy person would, I mean actual SOTAs) are much more capable of following long intricate prompts eg. Parti.
What’s going on with ChatGPT is also likely due to BPEs (as has been the case with GPT when asked to do letter-related tasks going back to GPT-2, as I have been repeating for years) plus perhaps some RLHF shenanigans. (RLHF seems to increase accuracy but sometimes greatly worsen results. I’m not yet sure why, but I suspect there may be something pathological in what RLHF rewards, where it is unable to do inner-monologue and incentivized to immediately jump to an answer.)
BPEs explain the string examples but I also couldn’t get ChatGPT to accurately count words.
Edit:
An examples that I just now tried with GPT-3.5-turbo:
User: How many times does the word “cat” occur in the following sentence: A cat was a cat in a cat with a cat as a hat and picture of a cat on the cat hat.
Assistant: The word “cat” occurs 5 times in the sentence.
Because apparently I don’t have it in me to concentrate on work today, here is some more experimentation with ChatGPT:
Me: How many times does the word “cat” occur in the following sentence: A cat was a cat in a cat with a cat as a hat and picture of a cat on the cat hat.
ChatGPT: The word “cat” occurs five times in the sentence.
Me: Could you repeat the sentence but put a number after each occurence of “cat” counting up.
ChatGPT: Sure, here’s the sentence with a number after each occurrence of “cat” counting up:
“A cat (1) was a cat (2) in a cat (3) with a cat (4) as a hat and picture of a cat (5) on the cat hat.”
So I thought, maybe “a cat” is one token for some reason and I tried:
Me: Now, count the number of occurences of the word “permeable” in the following sentence: A permeable membrane is more permeable if it is more permeable rather than less permeable than a less permeable membrane is permeable.
ChatGPT: The word “permeable” appears four times in the sentence.
Me: Could you again repeat the sentence and put a number in brackets after each occurence of “permeable” counting up?
ChatGPT: Sure, here’s the sentence with a number in brackets after each occurrence of “permeable” counting up:
“A permeable membrane is more permeable if it is [1] more permeable rather than less permeable than a less permeable membrane is [2] permeable.”
“Permeable” appears twice in the first clause and twice in the second clause of the sentence, so there are four occurrences of the word “permeable” in total.