What makes math problems hard for reinforcement learning: a case study

Link post

Abstract: Using a long-standing conjecture from combinatorial group theory, we explore, from multiple angles, the challenges of finding rare instances carrying disproportionately high rewards. Based on lessons learned in the mathematical context defined by the Andrews-Curtis conjecture, we propose algorithmic improvements that can be relevant in other domains with ultra-sparse reward problems. Although our case study can be formulated as a game, its shortest winning sequences are potentially or times longer than those encountered in chess. In the process of our study, we demonstrate that one of the potential counterexamples due to Akbulut and Kirby, whose status escaped direct mathematical methods for 39 years, is stably AC-trivial.

Introduction

We live in an extraordinary era where artificial intelligence (AI) is transforming numerous sectors and professions. Recent advancements in Large Language Models (LLMs) have empowered AI to read, write, and converse with a proficiency comparable to that of human experts. In the realm of board games, AI has outperformed even the most skilled human players, and it has tackled complex scientific challenges like protein folding, where steady progress was suddenly overtaken by a near-complete solution.

As AI continues to evolve, one critical question remains: How wide is the range of domains in which AI systems can reason as effectively as humans? Mathematics appears to be a natural progression on the path toward Artificial General Intelligence (AGI) due to its universal syntactic and logical structure, similar to that of natural language. Additionally, mathematics provides a framework for the quantitative evaluation of logical and analytical reasoning, making it an ideal domain for self-improving AI systems on the path to AGI.

In a moment, we will explain another reason why mathematics could play a crucial role in AGI development, but first, we need to introduce one more key element: reinforcement learning (RL). Machine learning, a subfield of AI, involves developing algorithms and statistical models that enable computers to learn from data and make predictions. Among the three primary areas of machine learning—supervised learning, unsupervised learning, and reinforcement learning—RL emphasizes learning through interaction with an environment and receiving feedback in the form of rewards or penalties. This aspect of machine learning, often characterized by its focus on AI models ‘playing games,’ will be central to our discussion.

A typical chess game lasts about 30 to 40 moves, with the longest recorded professional game reaching 269 moves, ending in a draw between Ivan Nikolic and Goran Arsovic in 1989. Notably, the number of moves in a typical chess game is relatively consistent, with the longest professional game having only about an order of magnitude more moves than the average. Similarly, a typical game of Go involves a few hundred moves, with the longest recorded professional game, played by Go Seigen and Kitani Minoru in 1933, lasting 411 moves.

Read the rest on arxiv