the inference-time compute argument, both the weakest and the most essential
I think this will be done via multi-agent architectures (“society of mind” over an LLM).
This does require plenty of calls to an LLM, so plenty of inference time compute.
For example, the current leader of https://huggingface.co/spaces/gaia-benchmark/leaderboard is this relatively simple multi-agent concoction by a Microsoft group: https://github.com/microsoft/autogen/tree/gaia_multiagent_v01_march_1st/samples/tools/autogenbench/scenarios/GAIA/Templates/Orchestrator
I think that cutting-edge in this direction is probably non-public at this point (which makes a lot of sense).
I think this will be done via multi-agent architectures (“society of mind” over an LLM).
This does require plenty of calls to an LLM, so plenty of inference time compute.
For example, the current leader of https://huggingface.co/spaces/gaia-benchmark/leaderboard is this relatively simple multi-agent concoction by a Microsoft group: https://github.com/microsoft/autogen/tree/gaia_multiagent_v01_march_1st/samples/tools/autogenbench/scenarios/GAIA/Templates/Orchestrator
I think that cutting-edge in this direction is probably non-public at this point (which makes a lot of sense).