Ben Livengood comments on Is “Recursive Self-Improvement” Relevant in the Deep Learning Paradigm?

Ben Livengood 6 Apr 2023 22:56 UTC
3 points
0
Naive MCTS in the real world does seem difficult to me, but e.g. action networks constrain the actual search significantly. Imagine a value network good at seeing if solutions work (maybe executing generated code and evaluating the output) and plugging a plain old LLM in as the action network; it could theoretically explore the large solution space better than beam search or argmax+temperature[0].

0: https://openreview.net/forum?id=Lr8cOOtYbfL is from February and I found it after writing this comment, figuring someone else probably had the same idea.