kpreid comments on Models Don’t “Get Reward”

kpreid 17 Jan 2023 23:13 UTC
3 points
1

What distinguishes this from how my brain works?

Your brain stores memories of input and also of previous thoughts you had and the experience of taking actions. Within the “replaced with a new version” view of the time evolution of your brain (which is also the pure-functional-programming view of a process communicating with the outside world), we can say that the input it receives next iteration contains lots of information from outputs it made in the preceding iteration.

But with the reinforcement learning algorithm, the previous outputs are not given as input. Rather, the previous outputs are fed to the reward function, and the reward function’s output is fed to the gradient descent process, and that determines the future weights. It seems like a much noisier channel.

Also, individual parts of a brain (or ordinary computer program with random access memory) can straightforwardly carry state forward that is mostly orthogonal to state in other parts (thus allowing semi-independent modules to carry out particular algorithms); it seems to me that the model cannot do that — cannot increase the bandwidth of its “train of thought while being trained” — without inventing an encoding scheme to embed that information into its performance on the desired task such that the best performers are also the ones that will think the next thought. It seems fairly implausible to me that a model would learn to execute such an internal communication system, while still outcompeting models “merely” performing the task being trained.

(Disclaimer: I’m not familiar with the details of ML techniques; this is just loose abstract thinking about that particular question of whether there’s actually any difference.)