Personally, I think approaches like STaR (28 March 2022) will be important: bootstrap from weak chain-of-thought reasoners to strong ones by retraining on successful inner monologues. They also implement “backward chaining”: training on monologues generated with the correct answer visible.
Personally, I think approaches like STaR (28 March 2022) will be important: bootstrap from weak chain-of-thought reasoners to strong ones by retraining on successful inner monologues. They also implement “backward chaining”: training on monologues generated with the correct answer visible.