Thank you for trying so hard to replicate this little experiment of mine.
Perhaps you sent the prompt in the middle of a conversation rather than at the beginning? If the same list was also sent earlier in the conversation, I can imagine it managed to get the answer right because it had more time to ‘take in’ the numbers, or otherwise establish a context that guided it to the right answer.
Yes, this is exactly what I did. I made sure to use a new list of numbers for each question – I’d noticed that it would remember previous answers if I didn’t – but I didn’t ask each of these in its own conversation. On one hand, it would have been cleaner if I’d started fresh each time; on the other, I wouldn’t have had that great moment where it wrote code and then hallucinated that it had executed it.
I hadn’t noticed your patterns when I ran mine. What I did notice is that its answer always included the first element of my list, usually (~80%) included the second, the third was often (~40%) halfway through the list, and the last one was always a hallucination. That’s eerily similar to what I’d expect from the four-nested-loop algorithm (minus the hallucination).
That would explain why, when I asked it to find the “closest” sum, it picked three numbers, and then, upon their being too small to work, picked the second-largest number from the list. It would also explain why the hallucinated number was the last number every single time.
I agree with this.
Sometimes it gives what might be the laziest “breakdown” I’ve ever seen.
Thank you for trying so hard to replicate this little experiment of mine.
Yes, this is exactly what I did. I made sure to use a new list of numbers for each question – I’d noticed that it would remember previous answers if I didn’t – but I didn’t ask each of these in its own conversation. On one hand, it would have been cleaner if I’d started fresh each time; on the other, I wouldn’t have had that great moment where it wrote code and then hallucinated that it had executed it.
I hadn’t noticed your patterns when I ran mine. What I did notice is that its answer always included the first element of my list, usually (~80%) included the second, the third was often (~40%) halfway through the list, and the last one was always a hallucination. That’s eerily similar to what I’d expect from the four-nested-loop algorithm (minus the hallucination).
I agree with this.
I laughed hard at this.