London L. comments on Models Don’t “Get Reward”

London L. 22 Feb 2023 0:37 UTC
7 points
4
To (rather gruesomely) link this back to the dog analogy, RL is more like asking 100 dogs to sit, breeding the dogs which do sit and killing those which don’t. Overtime, you will have a dog that can sit on command. No dog ever gets given a biscuit.
The phrasing I find most clear is this: Reinforcement learning should be viewed through the lens of selection, not the lens of incentivisation.

I was talking through this with an AGI Safety group today, and while I think the selection lens is helpful and helps illustrate your point, I don’t think the analogy quoted above is accurate in the way it should be.

The analogy you give is very similar to genetic algorithms, where models that get high reward are blended together with the other models that also get high reward and then mutated randomly. The only process that pushes performance to be better is that blending and mutating, which doesn’t require any knowledge of how to improve the models’ performance.

In other words, carrying out the breeding process requires no knowledge about what will make the dog more likely to sit. You just breed the ones that do sit. In essence, not even the breeder “gets the connection between the dogs’ genetics and sitting”.

However, at the start of your article, you describe gradient descent, which requires knowledge about how to get the models to perform better at the task. You need the gradient (the relationship between model parameters and reward at the level of tiny shifts to the model parameters) in order to perform gradient descent.

In gradient descent, just like the genetic algorithms, the model doesn’t get access to the gradient or information about the reward. But you still need to know the relationship between behavior and reward in order to do the update. In essence, the model trainer “gets the connection between the models’ parameters and performance”.

So I think a revised version of the dog analogy that takes this connection information into account might be something more like:
1. Asking a single dog to sit, and monitoring its brain activity during and afterwards.
2. Search for dogs that have brain structures that you think would lead to them sitting, based on what you observed in the first dog’s brain.
3. Ask one of those dogs to sit and monitor its brain activity.
4. Based on what you find, search for a better brain structure for sitting, etc.
At the end, you’ll have likely found a dog that sits when you ask it to.

A more violent version would be:
1. Ask a single dog to sit and monitor brain activity
2. Kill the dog and modify its brain in a way you think would make it more likely to sit. Also remove all memories of its past life. (This is equivalent to trying to find a dog with a brain that’s more conducive to sitting.)
3. Reinsert the new brain into the dog, ask it to sit again, and monitor its brain activity.
4. Repeat the process until the dog sits consistently.

To reiterate, with your breeding analogy, the person doing the breeding doesn’t need to know anything about how the dogs’ brains relate to sitting. They just breed them and hope for the best, just like in genetic algorithms. However, with this brain modification analogy, you do need to know how the dog’s brain relates to sitting. You modify the brain in a way that you think will make it better, just like in gradient descent.

I’m not 100% sure why I think that this is an important distinction, but I do, so I figured I’d share it, in hopes of making the analogy less wrong.