Intuitively, the AutoGPT concept sounds like it should be useful if a company invests in it. Yet, all the big publically available systems are seem to be chat interfaces where the human writes a messages and then the computer writes another message.
Even if AutoGPT-driven by an LLM alone wouldn’t achieve all ends, a combination where a human could oversee the steps and shepard AutoGPT, could likely be very productive.
The idea sounds to me like it’s simple enough that people at big companies should have considered it. Why isn’t something like that deployed?
When you start trying to make an agent, you realize how much your feedback, rerolls, etc are making chat based llms useful
the error correction mechanism is you in a chat based llms, and in the absence of that, it’s quite easy for agents to get off track
you can of course add error correction mechanism like multiple llms checking each other, multiple chains of thought, etc, but the cost can quickly get out of hand
Is answer assumes that you either have a fully chat based version or one that operates fully autonomous.
You could build something in the middle where every step of the agent gets presented to a human who can press next or correct the agent. An agent might even propose multiple ways forward and let the human decide. That then produces the training data for the agent to get better in the future.
This exists and is getting more popular, especially with coding, but also in other verticals
Which one’s do you see as the top ones?
I’ve been using Aider recently with coding. It’s a mixed bag, but overall I think I like it. You can configure whether it just acts, or asks for permission first.
I have an AI agent that wrote myself; I use it on average 5x per week over the last 6 months. I think it’s moderately useful. I mostly use it for simple shell tasks that would otherwise require copy-pasting back and forth with claude.ai.
My guess is that the big AI companies don’t think the market for this is big enough to be worth making a product out of it.
Best typo :D
Anthropic’s computer use model and Google’s Deep Research both do this. Training systems like this to work reliably has been a bottleneck to releasing them