@Soroush Pour was working on a startup Harmony Intelligence in all these areas, you may be interested to talk to him.
All the ideas below are a bit early given the lack of economically valuable agents that currently exist. However, I would wager that the next iteration of LLMs (e.g. GPT-5) will unlock a world of enterprise automation, as well as be able to perform basic consumer tasks like booking a flight or scheduling a dinner with friends.
I think GPT-5 level capabilities are not even required for that. It just requires iteration and optimisation of “LLM programs” (or LM agent architectures, no bright line between these) in an automated way. There are already works in this direction, e.g., Promptbreeder, Self-Taught Optimizer, and otherworks.
All this suffices GPT-4 capabilities, and I think it will enter mainstream (including the industry) in 2024. Josh Albrecht (CTO of Imbue) alludes to this here (and, apparently, Imbue works on productising something like their own version of Self-Taught Optimizer).
Agent testing environments This is similar to building testing software for LLMs, but once systems become agentic / multi-step, it’s even harder to build test cases. More importantly, one would likely need to be able to easily build agent environments and set them up and tear them down automatically in addition to managing test cases successfully.
Incidentally, Imbue is also working on this: see Avalon, and Josh has also said that they are planning to add language capabilities to this environment.
@Soroush Pour was working on a startup Harmony Intelligence in all these areas, you may be interested to talk to him.
I think GPT-5 level capabilities are not even required for that. It just requires iteration and optimisation of “LLM programs” (or LM agent architectures, no bright line between these) in an automated way. There are already works in this direction, e.g., Promptbreeder, Self-Taught Optimizer, and other works.
All this suffices GPT-4 capabilities, and I think it will enter mainstream (including the industry) in 2024. Josh Albrecht (CTO of Imbue) alludes to this here (and, apparently, Imbue works on productising something like their own version of Self-Taught Optimizer).
Incidentally, Imbue is also working on this: see Avalon, and Josh has also said that they are planning to add language capabilities to this environment.