I’ve read that OpenAI and DeepMind are hiring for multi-agent reasoning teams. I can imagine that gives another source of scaling.
I figure things like Amdahl’s law / communication overhead impose some limits there, but MCTS could probably find useful ways to divide the reasoning work and have the agents communicating at least at human level efficiency.
I’ve seen a bunch of people talking about how recent reasoning models are only useful for tasks which we are able to automatically verify.
I’m not sure this is necessarily true.
Reading the rStar paper has me thinking that if someone is able to turn the RL handle on mostly-general reasoning—using automatically verifiable tasks to power the training—it seems plausible that they might end up locking onto something that generalises enough to be superhuman on other tasks.
It’s a shame that little things—counting, tokenization—seem like they’re muddying the waters for LLM poetry (although maybe I’m out-of-date with my understanding of this). If that weren’t the case, it feels like it’d be a nice way to check out-of-distribution reasoning power.