abhayesian comments on Proposal: Using Monte Carlo tree search instead of RLHF for alignment research

abhayesian 21 Apr 2023 1:25 UTC
2 points
0
I would also like to see some sort of symbolic optimization process operating as a wrapper for an LLM to act as an interpretable bridge between the black-box model and the real world, but I doubt Monte-Carlo Tree Search\Expectimax is the right sort of algorithm. Maybe something closer to GOFAI planner calling and parsing LLM outputs in a way similar to Factored Cognition might be better and much more computationally efficient.