Unnamed comments on Babble

Unnamed Jan 11, 2018, 9:45 AM
17 points
Something something reinforcement learning partially observable Markov decision process I’m in over my head.
Some aspects of this remind me of generative adversarial networks (GANs).
In one use case: The Generator network (Babbler) takes some noies as input and generates an image. The Discriminator network (sorta Pruner) tries to say if that image came from the set of actual photographs or from the Generator. The Discriminator wins if it guesses correctly, the Generator wins if it fools the Discriminator. Both networks get trained up and get better and better at their task. Eventually (if things go right) the Generator makes photorealistic images.
So the pruning happens in two ways: first the Discriminator learns to recognize bad Babble by comparing the Babble with “reality”. Then the Generator learns the structure behind what the Discriminator catches and learns a narrower target for what to generate so that it doesn’t produce that kind of unrealistic Babble in the first place. And the process iterates—once the Generator learns not to make more obvious mistakes, then the Discriminator learns to catch subtler mistakes.
GANs share the failure mode of a too-strict Prune filter, or more specifically a Discriminator that is much better than the Generator. If every image that the Generator produces is recognized as a fake then it doesn’t get feedback about some pieces of Babble being better than others so it stops learning.
(Some other features of Babble aren’t captured by GANs.)
- SquirrelInHell Jan 12, 2018, 3:20 PM
  7 points
  Parent
  Yes, and this concept and these connections have already been discussed in 5 or 10 different posts on LW and related blogs, see e.g. this though I won’t bother to compile a full list.
  (Note that I still like the post, converging on a “catchy” way to put an important concept is valuable.)
  - alkjash Jan 12, 2018, 4:21 PM
    2 points
    Parent
    Cool, it seems like we’re independently circumambulating the same set of ideas. I’m curious how much your models agree with the more fleshed out version I described in the other post.
- LawrenceC Jan 24, 2018, 3:18 AM
  2 points
  Parent
  I find the similarities between modern chatbots and the babble/prune model more appropriate. For example, the recent MILA chatbot uses several response models to generate candidate responses based on the dialogue history, and then a response selection policy to select which of the responses to return.
  More generally, the concept of seperate algorithms for action proposal and action evaluation is quite widespread in modern deep learning. For example, you can think of AlphaGo’s policy network as serving the action proposal/babble role, while the MCTS procedure serves does action evaluation/pruning. (More generally, you can see this with any sort of game tree search algorithm that uses a heuristic to expand promising nodes.) Or, with some stretching, you can think of actor-critic based reinforcement learning algorithms as being composed of babble/prune parts.
  GANs fall into the Babble/Prune model mainly insofar as there are two parts, one serving as action proposal and the other serving as action evaluation; beyond this high level; the fit feels very forced. I think that from modern deep learning, both the MILA chatbot and AlphaGo’s MCTS procedure are much better fits to the babble/prune model than GANs.