I think the appeal of symbolic and hybrid approaches is clear, and progress in this direction would absolutely transform ML capabilities. However, I believe the approach remains immature in a way that the phrase “Human-Level Reinforcement Learning” doesn’t communicate.
The paper uses classical symbolic methods and so faces that classic enemy of GOFAI: super-exponential asymptotics. In order to make the compute more manageable, the following are hard-coded into EMPA:
Direct access to game state (unlike the neural networks, which learned from pixels)
The existence of walls, and which objects are walls
The 14 possible object interactions (That some objects are dangerous, that some can be pushed, some are walls, etc)
Which object is the player, and what type of player (Shooter or MovingAvatar), and which objects are the player’s bullets
The form of the objective (always some object count == 0)
That object interactions are deterministic
That picking up resources is good
The physics of projectile firing: reward was directly transported from a simulation of what a fired projectile hit, obviating the need to plan over that long time horizon
etc, etc, etc
Additionally, the entire algorithm is tuned to their own custom dataset,. None of this would be feasible for Atari games, or indeed the GVGAI competition, whose video game descriptive language they use to write their own environments. There’s a reason they don’t evaluate on any of the many existing benchmarks.
I come across a paper like this every once in a while: “The Revenge of GOFAI”. Dileep George et al’s Recursive cortical networks. Deepmind’s Apperception engine. Tenenbaum’s own Omniglot solver. They have splashy titles and exciting abstracts, but look into the methods section and you’ll find a thousand bespoke and clever shortcuts, feature engineering for the modern age. It’s another form of overfitting, it doesn’t generalize. The super-exponential wall remains as sheer ever and these approaches simply cannot scale.
I’ll reiterate that any progress in these areas would mean substantially more powerful, more explainable models. I applaud these researchers for their work on a hard and important problem. However, I can’t consider these papers to represent progress. Instead, I find them aspirational, like the human mind itself: that our methods might someday truly be this capable, without the tricks. I’m left hoping and waiting for insight of a qualitatively different sort.
I think the response I’m most sympathetic to is something like “yes, currently to get human-level results you need to bake in a lot of knowledge, but as time goes on we will need less and less of this”. For example, instead of having a fixed form for the objective, you could have a program sketch in a probabilistic programming language. Instead of hardcoded state observations, you use the features from a learned model. In order to cross the superexponential barrier, you also sprinkle neural networks around; instead of using a small grammar of possible programs, you use the distribution induced by something like Codex, when doing probabilistic inference, you train a neural network to predict the output of the inference, etc. There are lots of details to be filled in here, but that is true in any path to AGI.
(Though I still generally expect AGI where the main ingredient is “scaled up neural networks”.)
I think the appeal of symbolic and hybrid approaches is clear, and progress in this direction would absolutely transform ML capabilities. However, I believe the approach remains immature in a way that the phrase “Human-Level Reinforcement Learning” doesn’t communicate.
The paper uses classical symbolic methods and so faces that classic enemy of GOFAI: super-exponential asymptotics. In order to make the compute more manageable, the following are hard-coded into EMPA:
Direct access to game state (unlike the neural networks, which learned from pixels)
The existence of walls, and which objects are walls
The 14 possible object interactions (That some objects are dangerous, that some can be pushed, some are walls, etc)
Which object is the player, and what type of player (Shooter or MovingAvatar), and which objects are the player’s bullets
The form of the objective (always some object count == 0)
That object interactions are deterministic
That picking up resources is good
The physics of projectile firing: reward was directly transported from a simulation of what a fired projectile hit, obviating the need to plan over that long time horizon
etc, etc, etc
Additionally, the entire algorithm is tuned to their own custom dataset,. None of this would be feasible for Atari games, or indeed the GVGAI competition, whose video game descriptive language they use to write their own environments. There’s a reason they don’t evaluate on any of the many existing benchmarks.
I come across a paper like this every once in a while: “The Revenge of GOFAI”. Dileep George et al’s Recursive cortical networks. Deepmind’s Apperception engine. Tenenbaum’s own Omniglot solver. They have splashy titles and exciting abstracts, but look into the methods section and you’ll find a thousand bespoke and clever shortcuts, feature engineering for the modern age. It’s another form of overfitting, it doesn’t generalize. The super-exponential wall remains as sheer ever and these approaches simply cannot scale.
I’ll reiterate that any progress in these areas would mean substantially more powerful, more explainable models. I applaud these researchers for their work on a hard and important problem. However, I can’t consider these papers to represent progress. Instead, I find them aspirational, like the human mind itself: that our methods might someday truly be this capable, without the tricks. I’m left hoping and waiting for insight of a qualitatively different sort.
Yeah, I should perhaps have emphasized this more.
I think the response I’m most sympathetic to is something like “yes, currently to get human-level results you need to bake in a lot of knowledge, but as time goes on we will need less and less of this”. For example, instead of having a fixed form for the objective, you could have a program sketch in a probabilistic programming language. Instead of hardcoded state observations, you use the features from a learned model. In order to cross the superexponential barrier, you also sprinkle neural networks around; instead of using a small grammar of possible programs, you use the distribution induced by something like Codex, when doing probabilistic inference, you train a neural network to predict the output of the inference, etc. There are lots of details to be filled in here, but that is true in any path to AGI.
(Though I still generally expect AGI where the main ingredient is “scaled up neural networks”.)