I’m not sure the level of sophistication you want, but here’s an answer:
Performance on the games is much better and the amount of game time it takes for the AI to reach a certain performance is much lower. Yet fundamentally they are doing the same thing: solving a Markov Decision Process (abbreviated MDP) using Reinforcement Learning and Deep Learning. Pretty much any problem where there’s anything resembling an “environment state” where you make “decisions” can be modelled as an MDP. Small MDPs can be exactly solved with dynamic programming.
For instance, if you have a 10 by 10 grid and different “rewards” placed at each grid location with a player moving around on the grid collecting rewards, you can solve the path the player should take exactly. That is because the environment is small and has only 100 states, so it’s easy to just store a table with 100 values with rules like “at location (4,3) go right with probability 0.9″. The real world is much larger, and so you can’t run the algorithm for solving the MDP exactly, what you need are approximations of this exact algorithm.
The whole field of Deep Reinforcement Learning right now is basically how to make better and better approximations of this exact algorithm that work in the real world. In this sense almost everything Deepmind has been doing is the same: building better approximations for the same underlying algorithm. But the approximations have gotten way way better, lots of tricks and heuristics were developed and people understand how to make the algorithms run consistently now (at the beginning there was lots of “you need this particular random seed to make it work”).
I’m not sure the level of sophistication you want, but here’s an answer:
Performance on the games is much better and the amount of game time it takes for the AI to reach a certain performance is much lower. Yet fundamentally they are doing the same thing: solving a Markov Decision Process (abbreviated MDP) using Reinforcement Learning and Deep Learning. Pretty much any problem where there’s anything resembling an “environment state” where you make “decisions” can be modelled as an MDP. Small MDPs can be exactly solved with dynamic programming.
For instance, if you have a 10 by 10 grid and different “rewards” placed at each grid location with a player moving around on the grid collecting rewards, you can solve the path the player should take exactly. That is because the environment is small and has only 100 states, so it’s easy to just store a table with 100 values with rules like “at location (4,3) go right with probability 0.9″. The real world is much larger, and so you can’t run the algorithm for solving the MDP exactly, what you need are approximations of this exact algorithm.
The whole field of Deep Reinforcement Learning right now is basically how to make better and better approximations of this exact algorithm that work in the real world. In this sense almost everything Deepmind has been doing is the same: building better approximations for the same underlying algorithm. But the approximations have gotten way way better, lots of tricks and heuristics were developed and people understand how to make the algorithms run consistently now (at the beginning there was lots of “you need this particular random seed to make it work”).