Fairly minor but I think I see an unmentioned error in the “41” section:
the first six positive even integers: 2 + 4 + 6 + 8 + 10 + 11 = 41
11 is not even (it seems to be thinking somewhat of 42?)
Edit: Actually the more I think about it, it’s a pretty interesting error. ChatGPT 4 produces much better answers than I would for the majority of these questions, but I don’t think I would make this error. If you asked it, I’m sure it would correctly explain that the sum of even integers cannot be odd, or that 11 is not even, etc, but I wonder if (for example) a large amount of training text about the number 42 being the sum of the first six positive even integers was somehow “close” enough to “41″ to overwhelm any emergent understanding of how to apply those concepts?
The way that LLM tokenization represents numbers is all kinds of stupid. It’s honestly kind of amazing to me they don’t make even more arithmetic errors. Of course, an LLM can use a calculator just fine, and this is an extremely obvious way to enhance its general intelligence. I believe “give the LLM a calculator” is in fact being used, in some cases, but either the LLM or some shell around it has to decide when to use the calculator and how to use the calculator’s result. That apparently didn’t happen or didn’t work properly in this case.
Wild guess: It realised its mistake partway through, and followed through it anyway as sensibly as could be done, balancing between giving a wrong calculation (“+ 12 = 41“), ignoring the central focus of the question (” + 12 = 42”), and breaking from the “list of even integers” that it was supposed to be going through. I suspect it would not make this error when using chain-of-thought.
Fairly minor but I think I see an unmentioned error in the “41” section:
11 is not even (it seems to be thinking somewhat of 42?)
Edit: Actually the more I think about it, it’s a pretty interesting error. ChatGPT 4 produces much better answers than I would for the majority of these questions, but I don’t think I would make this error. If you asked it, I’m sure it would correctly explain that the sum of even integers cannot be odd, or that 11 is not even, etc, but I wonder if (for example) a large amount of training text about the number 42 being the sum of the first six positive even integers was somehow “close” enough to “41″ to overwhelm any emergent understanding of how to apply those concepts?
The way that LLM tokenization represents numbers is all kinds of stupid. It’s honestly kind of amazing to me they don’t make even more arithmetic errors. Of course, an LLM can use a calculator just fine, and this is an extremely obvious way to enhance its general intelligence. I believe “give the LLM a calculator” is in fact being used, in some cases, but either the LLM or some shell around it has to decide when to use the calculator and how to use the calculator’s result. That apparently didn’t happen or didn’t work properly in this case.
Thanks, and sorry I missed that error. I’ve updated the post by bolding the error, and also HT’ed your contribution.
Wild guess: It realised its mistake partway through, and followed through it anyway as sensibly as could be done, balancing between giving a wrong calculation (“+ 12 = 41“), ignoring the central focus of the question (” + 12 = 42”), and breaking from the “list of even integers” that it was supposed to be going through. I suspect it would not make this error when using chain-of-thought.