The post otherwise makes sense to me, but I’m confused by this bit:
It can do better if it’s allowed to run algorithms by “thinking out loud”. It’s really slow, and this is a good way to fill up its context buffer. The slowness is a real problem—if it outputs ~10 token/sec, it will take forever to solve any problems that are actually both big and hard. This is a neat trick, but it doesn’t seem like an important improvement to its capabilities.
Why not?
It seems like humans also run into the same problem—the brain can only do a limited amount of inference per unit of thought. We get around it by having a working memory, which we may extend by writing things down, to store intermediate steps of our reasoning so that we don’t have to simulate everything in one go. It seems to me that “thinking out loud” and writing things to its context buffer is what lets GPT have a working memory the same way that humans do.
And e.g. if someone instructs ChatGPT to first do one thing and then another thing—say, first generating an outline of a plan and then filling in intermediate steps of the plan—then they are effectively using it to solve problems that couldn’t be solved in a constant time. Which to me seems like a huge improvement to its capabilities, since it lifts the restriction of “can only solve constant-time problems”.
You seem to suggest that slowness is a problem, but speed can always be optimized. Humans also seem to have a thing where, after they repeat the same calculation sufficiently many times, they memorize the end result and don’t need to recalculate it each time anymore. You could copy this by having some mechanism that automatically detected when the LLM had done the same calculation many times. The mechanism would then use the output of that calculation to finetune the LLM, so that it could skip right to the end result the next time it needed to do the same calculation.
Which to me seems like a huge improvement to its capabilities
This was actually my position when I started writing this post. My instincts told me that “thinking out loud” was a big enhancement to its capabilities. But then I started thinking about what I saw. I watched it spend tens of trillions of FLOPs to write out, in English, how to do a 3x3 matrix multiplication. It was so colossally inefficient, like building a humanoid robot and teaching it to use an abacus.
Then again, your analogy to humans is valid. We do a huge amount of processing internally, and then have this incredibly inefficient communication mechanism called writing, which we then use to solve very hard problems!
So my instincts point both ways on this, but I have nothing resembling rigorous proof one way or the other. So I’m pretty undecided.
I watched it spend tens of trillions of FLOPs to write out, in English, how to do a 3x3 matrix multiplication. It was so colossally inefficient, like building a humanoid robot and teaching it to use an abacus.
There’s also the case where it’s allowed to call other services that are more optimized for the specific use case in question, such as querying Wolfram Alpha:
The post otherwise makes sense to me, but I’m confused by this bit:
Why not?
It seems like humans also run into the same problem—the brain can only do a limited amount of inference per unit of thought. We get around it by having a working memory, which we may extend by writing things down, to store intermediate steps of our reasoning so that we don’t have to simulate everything in one go. It seems to me that “thinking out loud” and writing things to its context buffer is what lets GPT have a working memory the same way that humans do.
And e.g. if someone instructs ChatGPT to first do one thing and then another thing—say, first generating an outline of a plan and then filling in intermediate steps of the plan—then they are effectively using it to solve problems that couldn’t be solved in a constant time. Which to me seems like a huge improvement to its capabilities, since it lifts the restriction of “can only solve constant-time problems”.
You seem to suggest that slowness is a problem, but speed can always be optimized. Humans also seem to have a thing where, after they repeat the same calculation sufficiently many times, they memorize the end result and don’t need to recalculate it each time anymore. You could copy this by having some mechanism that automatically detected when the LLM had done the same calculation many times. The mechanism would then use the output of that calculation to finetune the LLM, so that it could skip right to the end result the next time it needed to do the same calculation.
This was actually my position when I started writing this post. My instincts told me that “thinking out loud” was a big enhancement to its capabilities. But then I started thinking about what I saw. I watched it spend tens of trillions of FLOPs to write out, in English, how to do a 3x3 matrix multiplication. It was so colossally inefficient, like building a humanoid robot and teaching it to use an abacus.
Then again, your analogy to humans is valid. We do a huge amount of processing internally, and then have this incredibly inefficient communication mechanism called writing, which we then use to solve very hard problems!
So my instincts point both ways on this, but I have nothing resembling rigorous proof one way or the other. So I’m pretty undecided.
There’s also the case where it’s allowed to call other services that are more optimized for the specific use case in question, such as querying Wolfram Alpha: