Another viewpoint that points in a different direction: A few years ago, LLMs could only do tasks that require humans ~minutes. Now they’re at the ~hours point. So if this metric continues, eventually they’ll do tasks requiring humans days, weeks, months, …
I don’t have good intuitions that would help me to decide which of those viewpoints is better for predicting the future.
One reason to prefer my position is that LLM’s still seem to be bad at the kind of tasks that rely on using serial time effectively. For these ML research style tasks, scaling up to human performance over a couple of hours relied on taking the best of multiple calls, which seems like parallel time. That’s not the same as leaving an agent running for a couple of hours and seeing it work out something it previously would have been incapable of guessing (or that really couldn’t be guessed, but only discovered through interaction). I do struggle to think of tests like this that I’m confident an LLM would fail though. Probably it would have trouble winning a text based RPG? Or more practically speaking, could an LLM file my taxes without committing fraud? How well can LLM’s play board games these days?
Yeah I think that’s a valid viewpoint.
Another viewpoint that points in a different direction: A few years ago, LLMs could only do tasks that require humans ~minutes. Now they’re at the ~hours point. So if this metric continues, eventually they’ll do tasks requiring humans days, weeks, months, …
I don’t have good intuitions that would help me to decide which of those viewpoints is better for predicting the future.
One reason to prefer my position is that LLM’s still seem to be bad at the kind of tasks that rely on using serial time effectively. For these ML research style tasks, scaling up to human performance over a couple of hours relied on taking the best of multiple calls, which seems like parallel time. That’s not the same as leaving an agent running for a couple of hours and seeing it work out something it previously would have been incapable of guessing (or that really couldn’t be guessed, but only discovered through interaction). I do struggle to think of tests like this that I’m confident an LLM would fail though. Probably it would have trouble winning a text based RPG? Or more practically speaking, could an LLM file my taxes without committing fraud? How well can LLM’s play board games these days?