I have been struggling to find a way to respond, here.
When discussing this, we have to be really careful not to slip back and forth between “global RL”, in the sense that the whole system learns through RL, and “micro-RL”, where bits of the system using something like RL. I do keep trying to emphasize that I have no problem with the latter, if it proves feasible. I would never “claim that no reinforcement learning is going on in humans” because, quite the contrary, I believe it really IS going on there.
So where does that leave my essay, and this discussion? Well, a few things are important.
1 -- The incredible minuteness of the feasible types of RL must be kept in mind. In pure form, it explodes or becomes infeasible if the micro-domain gets above the reflex (or insect) level.
2 -- We need to remember that plain old “adaptation” is not RL. So, is there an adaptation mechanism that builds (e.g.) low level feature detectors in the visual system? I bet there is. Does it work by trying to optimize a single parameter? Maybe. Should we call that parameter a “reward” signal? Well, I guess we could. But it is equally possible that such mechanisms are simultaneously optimizing a few parameters, not just one. And it is also just as likely that such mechanisms are following rules that cannot be shoehorned into the architecture of RL (there being many other kinds of adaptation). Where am I going with this? Well, why would we care to distinguish the “RL style of adaptation mechanism” from other kinds of adaptation, down at that level? Why make a special distinction? When you think about it, those micro-RL mechanisms are boring and unremarkable …… RL only becomes worth remarking on IF it is the explanation for intelligence as a whole. The behaviorists thought they were the Isaac Newtons of psychology, because they though that something like RL could explain everything. And it is only when it is proposed at that global level that it has dramatic significance, because then you could imagine an RL-controlled AI building and amplifying its own intelligence without programmer intervention.
3 -- Most importantly, if there do exist some “micro-RL” mechanisms somewhere in an intelligence, at very low levels where RL is feasible, those instances do not cause any of their properties to bleed upward to higher levels. This is the same as a really old saw ….. that, just because computers do all their basic computation with in binary, that does not mean that the highest levels of the computer must use binary numbers. Sometimes you say things that sort of imply that because RL could exist somewhere, therefore we could learn “maybe something” from those mechanisms, when it comes to other, higher aspects of the system. That really, really does not follow, and it is a dangerous mistake to make.
So, at the end of the day, my essay was targeting the use of the RL idea ONLY in those cases where it was assumed to be global. All other appearances of something RL-like just do not have any impact on arguments about AI motivation and goals.
I have been struggling to find a way to respond, here.
When discussing this, we have to be really careful not to slip back and forth between “global RL”, in the sense that the whole system learns through RL, and “micro-RL”, where bits of the system using something like RL. I do keep trying to emphasize that I have no problem with the latter, if it proves feasible. I would never “claim that no reinforcement learning is going on in humans” because, quite the contrary, I believe it really IS going on there.
So where does that leave my essay, and this discussion? Well, a few things are important.
1 -- The incredible minuteness of the feasible types of RL must be kept in mind. In pure form, it explodes or becomes infeasible if the micro-domain gets above the reflex (or insect) level.
2 -- We need to remember that plain old “adaptation” is not RL. So, is there an adaptation mechanism that builds (e.g.) low level feature detectors in the visual system? I bet there is. Does it work by trying to optimize a single parameter? Maybe. Should we call that parameter a “reward” signal? Well, I guess we could. But it is equally possible that such mechanisms are simultaneously optimizing a few parameters, not just one. And it is also just as likely that such mechanisms are following rules that cannot be shoehorned into the architecture of RL (there being many other kinds of adaptation). Where am I going with this? Well, why would we care to distinguish the “RL style of adaptation mechanism” from other kinds of adaptation, down at that level? Why make a special distinction? When you think about it, those micro-RL mechanisms are boring and unremarkable …… RL only becomes worth remarking on IF it is the explanation for intelligence as a whole. The behaviorists thought they were the Isaac Newtons of psychology, because they though that something like RL could explain everything. And it is only when it is proposed at that global level that it has dramatic significance, because then you could imagine an RL-controlled AI building and amplifying its own intelligence without programmer intervention.
3 -- Most importantly, if there do exist some “micro-RL” mechanisms somewhere in an intelligence, at very low levels where RL is feasible, those instances do not cause any of their properties to bleed upward to higher levels. This is the same as a really old saw ….. that, just because computers do all their basic computation with in binary, that does not mean that the highest levels of the computer must use binary numbers. Sometimes you say things that sort of imply that because RL could exist somewhere, therefore we could learn “maybe something” from those mechanisms, when it comes to other, higher aspects of the system. That really, really does not follow, and it is a dangerous mistake to make.
So, at the end of the day, my essay was targeting the use of the RL idea ONLY in those cases where it was assumed to be global. All other appearances of something RL-like just do not have any impact on arguments about AI motivation and goals.