Quintin Pope comments on Evolution provides no evidence for the sharp left turn

Quintin Pope 21 Apr 2023 9:53 UTC
20 points
4
Autonomous learning basically requires there to be a generator-discriminator gap in the domain in question, i.e., that the agent trying to improve its capabilities in said domain has to be better able to tell the difference between its own good and bad outputs. If it can do so, it can just produce a bunch of outputs, score their goodness, and train / reward itself on its better outputs. In both situations you note (AZ and human mathematicians) there’s such a gap, because game victories and math results can both be verified relatively more easily than they can be generated.
If current LMs have such discriminator gaps in a given domain, they can also learn autonomously, up to the limit of their discrimination ability (which might improve as they get better at generation).
What links here?
- “Sharp Left Turn” discourse: An opinionated review by Steven Byrnes (28 Jan 2025 18:47 UTC; 195 points)