[Question] Why don’t we currently have AI agents?

ChristianKl26 Dec 2024 15:26 UTC

8 points

Intuitively, the AutoGPT concept sounds like it should be useful if a company invests in it. Yet, all the big publically available systems are seem to be chat interfaces where the human writes a messages and then the computer writes another message.

Even if AutoGPT-driven by an LLM alone wouldn’t achieve all ends, a combination where a human could oversee the steps and shepard AutoGPT, could likely be very productive.

The idea sounds to me like it’s simple enough that people at big companies should have considered it. Why isn’t something like that deployed?

ChristianKl26 Dec 2024 15:26 UTC

8 points

10 comments1 min readLW link

Matt Goldenberg 26 Dec 2024 20:19 UTC
16 points
3
When you start trying to make an agent, you realize how much your feedback, rerolls, etc are making chat based llms useful

the error correction mechanism is you in a chat based llms, and in the absence of that, it’s quite easy for agents to get off track

you can of course add error correction mechanism like multiple llms checking each other, multiple chains of thought, etc, but the cost can quickly get out of hand
- ChristianKl 27 Dec 2024 10:01 UTC
  2 points
  0
  Parent
  Is answer assumes that you either have a fully chat based version or one that operates fully autonomous.
  
  You could build something in the middle where every step of the agent gets presented to a human who can press next or correct the agent. An agent might even propose multiple ways forward and let the human decide. That then produces the training data for the agent to get better in the future.
  - Matt Goldenberg 27 Dec 2024 14:51 UTC
    4 points
    0
    Parent
    This exists and is getting more popular, especially with coding, but also in other verticals
    - ChristianKl 28 Dec 2024 1:19 UTC
      2 points
      0
      Parent
      Which one’s do you see as the top ones?
      - Nathan Helm-Burger 29 Dec 2024 3:07 UTC
        4 points
        0
        Parent
        I’ve been using Aider recently with coding. It’s a mixed bag, but overall I think I like it. You can configure whether it just acts, or asks for permission first.
Buck 26 Dec 2024 16:58 UTC
7 points
0
I have an AI agent that wrote myself; I use it on average 5x per week over the last 6 months. I think it’s moderately useful. I mostly use it for simple shell tasks that would otherwise require copy-pasting back and forth with claude.ai.
My guess is that the big AI companies don’t think the market for this is big enough to be worth making a product out of it.
- Leon Lang 26 Dec 2024 22:28 UTC
  7 points
  7
  Parent
  
  I have an AI agent that wrote myself
  
  Best typo :D
Caleb Biddulph 26 Dec 2024 16:04 UTC
4 points
1
Anthropic’s computer use model and Google’s Deep Research both do this. Training systems like this to work reliably has been a bottleneck to releasing them

Gordon Seidoh Worley 26 Dec 2024 20:36 UTC
2 points
−1
I can’t help but wonder if part of the answer is that they seem dangerous and people are selecting out of producing them.
Like I’m not an expert but creating AI agents seems extremely fun and appealing, and I’m intentionally working on it none because it seems safer not to build them. (Whether you think my contributions to trying to build them would matter or not is another question.)
Sodium 29 Dec 2024 8:13 UTC
1 point
1
I think the actual answer is: the AI isn’t smart enough and trips up a lot.

But I haven’t seen a detailed write up anywhere that talks about why the AI trips up and what are the types of places where it trips up. It feels like all of the existing evals work optimize for legibility/reproducibility/being clearly defined. As a result, it’s not measuring the one thing that I’m really interested in: why don’t we have AI agents replacing workers. I suspect that some startup’s internal doc on “why does our agent not work yet” would be super interesting to read and track over time.