MCTS seems difficult in “rich” (complex/high dimensional problem domains, continuous, stochastic, large state/action spaces) environments (e.g. the real world)?
Naive MCTS in the real world does seem difficult to me, but e.g. action networks constrain the actual search significantly. Imagine a value network good at seeing if solutions work (maybe executing generated code and evaluating the output) and plugging a plain old LLM in as the action network; it could theoretically explore the large solution space better than beam search or argmax+temperature[0].
MCTS seems difficult in “rich” (complex/high dimensional problem domains, continuous, stochastic, large state/action spaces) environments (e.g. the real world)?
My conclusion was that AGI progress would be deep learning based into the indefinite future, not pretrained transformers
Naive MCTS in the real world does seem difficult to me, but e.g. action networks constrain the actual search significantly. Imagine a value network good at seeing if solutions work (maybe executing generated code and evaluating the output) and plugging a plain old LLM in as the action network; it could theoretically explore the large solution space better than beam search or argmax+temperature[0].
0: https://openreview.net/forum?id=Lr8cOOtYbfL is from February and I found it after writing this comment, figuring someone else probably had the same idea.