This issue is further complicated by the fact that humans aren’t fully general reasoners without tool support either.
I think the discussion, not specifically here but just in general, vastly underestimates the significance of this point. It isn’t like we expect humans to solve meeting planning problems in our heads. I use Calendly, or Outlook’s scheduling assistant and calendar. I plug all the destinations into our GPS and re-order them until the time looks low enough. One of the main reasons we want to use LLMs for these tasks at all is that, even with tool support, they are not trivial or instant for humans to solve.
There is also a reason why standardized tests for kids so often include essay questions on breaking down tasks step by step, like (to pick an example from my own past) “describe in as much detail as possible how you would make a peanut butter and jelly sandwich.” Even aspiring professional chefs have to learn proper mis en place to keep on top of their (much more complex) cooking tasks. I won’t bother listing more examples, but most humans are not naturally good at these tasks.
Yes, current LLMs are worse on many axes. IDK if that would be true if we built wrappers to let them use the planning tools humans rely on in practice, and if we put them through the kinds of practice humans use to learn these skills IRL. I suspect they still would be, but to a much lesser degree. But then I also can’t help thinking about the constant stream of incredible-lack-of-foresight things I see other humans do on a regular basis, and wonder if I’m just overestimating us.
FWIW, after I wrote this comment, I asked Gemini what it thought. It came up with a very similar POV about what its limitations were, what tools would help it, and how much those tools would close the gap with humans. Also, it linked this blog post in its reply. https://gemini.google.com/app/a72701429c8d830a
But then I also can’t help thinking about the constant stream of incredible-lack-of-foresight things I see other humans do on a regular basis, and wonder if I’m just overestimating us.
I think the discussion, not specifically here but just in general, vastly underestimates the significance of this point. It isn’t like we expect humans to solve meeting planning problems in our heads. I use Calendly, or Outlook’s scheduling assistant and calendar. I plug all the destinations into our GPS and re-order them until the time looks low enough. One of the main reasons we want to use LLMs for these tasks at all is that, even with tool support, they are not trivial or instant for humans to solve.
There is also a reason why standardized tests for kids so often include essay questions on breaking down tasks step by step, like (to pick an example from my own past) “describe in as much detail as possible how you would make a peanut butter and jelly sandwich.” Even aspiring professional chefs have to learn proper mis en place to keep on top of their (much more complex) cooking tasks. I won’t bother listing more examples, but most humans are not naturally good at these tasks.
Yes, current LLMs are worse on many axes. IDK if that would be true if we built wrappers to let them use the planning tools humans rely on in practice, and if we put them through the kinds of practice humans use to learn these skills IRL. I suspect they still would be, but to a much lesser degree. But then I also can’t help thinking about the constant stream of incredible-lack-of-foresight things I see other humans do on a regular basis, and wonder if I’m just overestimating us.
FWIW, after I wrote this comment, I asked Gemini what it thought. It came up with a very similar POV about what its limitations were, what tools would help it, and how much those tools would close the gap with humans. Also, it linked this blog post in its reply. https://gemini.google.com/app/a72701429c8d830a
I often think here of @sarahconstantin ‘s excellent ‘Humans Who Are Not Concentrating Are Not General Intelligences’.