I just tried chatGPT 10 times. It said “line” 3⁄10 times. Of those 3 times, 2 of them said the line would be curved (wrong, though a human might say that as well). The other 7 times were mostly on “ellipse” or “irregular shape” (which are not among the options), but “circle” appeared as well. Note that if chatGPT guessed randomly among the options, it would get it right 2.5/10 times.
It’s perhaps not the best test of geometric reasoning, because it’s difficult for humans to understand the setup.
Doesn’t prompt to think step by step help in this case?
Not particularly, no. There are two reasons: (1) RLHF already tries to encourage the model to think step-by-step, which is why you often get long-winded multi-step answers to even simple arithmetic questions. (2) Thinking step by step only helps for problems that can be solved via easier intermediate steps. For example, solving “2x+5=5x+2” can be achieved via a sequence of intermediate steps; the model generally cannot solve such questions with a single forward pass, but it can do every intermediate step in a single forward pass each, so “think step by step” helps it a lot. I don’t think this applies to the ice cube question.
Doesn’t prompt to think step by step help in this case?
Not particularly, no. There are two reasons: (1) RLHF already tries to encourage the model to think step-by-step, which is why you often get long-winded multi-step answers to even simple arithmetic questions. (2) Thinking step by step only helps for problems that can be solved via easier intermediate steps. For example, solving “2x+5=5x+2” can be achieved via a sequence of intermediate steps; the model generally cannot solve such questions with a single forward pass, but it can do every intermediate step in a single forward pass each, so “think step by step” helps it a lot. I don’t think this applies to the ice cube question.