(or if you can’t name the startup, I’d love to hear more about what’s been achieved—eg what are the largest & longest-horizon tasks that your scaffolded systems can reliably accomplish?)
You.com — turn on our “genius mode” (free users get a few uses per day) and try asking it to do a moderately complex calculation, or better still figure out how to do one then do it.
Generally we find the combination of RAG web search and some agentic tool use makes an LLM appreciably more capable. (Similarly, OpenAI are also doing the tool use, and Google the web-scale RAG.)
We’re sticking to fairly short-horizon tasks, generally less than a dozen steps.
Can you name the startup? I’d be very interested to see what level of success it’s achieved.
(or if you can’t name the startup, I’d love to hear more about what’s been achieved—eg what are the largest & longest-horizon tasks that your scaffolded systems can reliably accomplish?)
You.com — turn on our “genius mode” (free users get a few uses per day) and try asking it to do a moderately complex calculation, or better still figure out how to do one then do it.
Generally we find the combination of RAG web search and some agentic tool use makes an LLM appreciably more capable. (Similarly, OpenAI are also doing the tool use, and Google the web-scale RAG.)
We’re sticking to fairly short-horizon tasks, generally less than a dozen steps.