Kaj_Sotala comments on Debunking Fallacies in the Theory of AI Motivation

Kaj_Sotala 9 May 2015 12:20 UTC
3 points
Re: concepts, I’d be curious to hear any thoughts you might have on any part of my concept safety posts.
- [deleted] 10 May 2015 20:02 UTC
  3 points
  Parent
  That’s a lot of stuff to read (apologies: my bandwidth is limited at the moment) but my first response on taking a quick glance through is that you mention reinforcement learning an awful lot …… and RL is just a disaster.
  
  I absolutely do not accept the supposed “neuroscience” evidence that the brain uses RL. If you look into that evidence in detail, it turns out to be flimsy. There are two criticisms. First, virtually any circuit can be made to look like it has RL in it, if there is just a bit of feedback and some adaptation—so in that sense finding evidence for RL in some circuit is like saying “we found a bit of feedback and some adaptation”, which is a trivial result.
  
  The second criticism of RL is that the original idea was that it operated at a high level in the system design. Finding RL features buried in the low level circuit behavior does not imply that it is present in any form whatsoever, in the high level design—e.g. at the concept level. This is for the same reason that we do not deduce, from the fact that computer circuits only use zeros and ones at the lowest level, that therefore they can only make statements about arithmetic if those statements contain only zeros and ones.
  
  The net effect of these two observations, taken with the historical bankruptcy of RL in the psychology context, means that any attempt to use it in discussions of concepts, nowadays, seems empty.
  
  I know that only addresses a tiny fraction of what you said, but at this point I am worried, you see: I do not know how much the reliance on RL will have contaminated the rest of what you have to say …....
  - Kaj_Sotala 10 May 2015 20:30 UTC
    6 points
    Parent
    Thanks. You are right, I do rely on an RL assumption quite a lot, and it’s true that it has probably “contaminated” most of the ideas: if I were to abandon that assumption, I’d have to re-evaluate all of the ideas.
    
    I admit that I haven’t dug very deeply into the neuroscience work documenting the brain using RL, so I don’t know to what extent the data really is flimsy. That said, I would be quite surprised if the brain didn’t rely strongly on RL. After all, RL is the theory of how an agent should operate in an initially unknown environment where the rewards and punishments have to be learned… which is very much the thing that the brain does.
    
    Another thing that makes me assign confidence to the brain using RL principles is that I (and other people) have observed in people a wide range of peculiar behaviors that would make perfect sense if most of our behavior was really driven by RL principles. It would take me too long to properly elaborate on that, but basically it looks to me strongly like things like this would have a much bigger impact on our behavior than any amount of verbal-level thinking about what would be the most reasonable thing to do.
    - [deleted] 12 May 2015 14:57 UTC
      5 points
      Parent
      I don’t disagree with the general drift here. Not at all.
      
      The place where I have issues is actually a little subtle (though not too much so). If RL appears in a watered-down form all over the cognitive system, as an aspect of the design, so to speak, this would be entirely consistent with all the stuff that you observe, and which I (more or less) agree with.
      
      But where things get crazy is when it is seen as the core principle, or main architectural feature of the system. I made some attempts to express this in the earliest blog post on my site, but the basic story is that IF it is proposed as MAIN mechanism, all hell breaks loose. The reason is that for it to be a main mechanism it needs supporting machinery to find the salient stimuli, find plausible (salient) candidate responses, and it needs to package the connection between these in a diabolically simplistic scalar (S-R contingencies), rather in some high-bandwidth structural relation. If you then try to make this work, a bizarre situation arises: so much work has to be done by all the supporting machinery, that it starts to look totally insane to insist that there is a tiny, insignificant little S-R loop at the center of it all!
      
      That, really, is why behaviorism died in psychology. It was ludicrous to pretend that the supporting machinery was trivial. It wasn’t. And when people shifted their focus and started looking at the supporting machinery, they came up with …… all of modern cognitive psychology! The idea of RL just became irrelevant, and it shriveled away.
      
      There is a whole book’s worth of substance in what happened back then, but I am not sure anyone can be bothered to write it, because all the cogn psych folks just want to get on with real science rather than document the dead theory that wasn’t working. Pity, because AI people need to read that nonexistent book.
      - Kaj_Sotala 12 May 2015 15:42 UTC
        2 points
        Parent
        Okay. In that case I think we agree. Like I mentioned in my reply to ChristianKI, I do feel that RL is an important mechanism to understand, but I definitely don’t think that you could achieve a very good understanding of the brain if you only understood RL. Necessary but not sufficient, as the saying goes.
        
        Any RL system that we want to do something non-trivial needs to be able to apply the things it has learned in one state to other similar states, which in turn requires some very advanced learning algorithms to correctly recognize “similar” states. (I believe that’s part of the “supporting machinery” you referred to.) Having just the RL component doesn’t get you anywhere near intelligence by itself.
    - ChristianKl 10 May 2015 23:02 UTC
      0 points
      Parent
      
      It would take me too long to properly elaborate on that, but basically it looks to me strongly like things like this would have a much bigger impact on our behavior than any amount of verbal-level thinking about what would be the most reasonable thing to do.
      
      That seems to me like an argument from lack of imagination. The fact that reinforcement learning is the best among those you can easily imagine doesn’t mean that it’s the best overall.
      
      If reinforcement learning would be the prime way we learn, understanding Anki cards before you memorize them shouldn’t be as important as it is. Having a card fail after 5 repetitions because the initial understanding wasn’t deep enough to build a foundation suggests that learning is about more than just reinforcing. Creating the initial strong understanding of a card doesn’t feel to me like it’s about reinforcement learning.
      
      On a theoretical level reinforcement learning is basically behaviorism. It’s not like behaviorism never works but modern cognitive behavior therapy moved beyond it. CBT does things that aren’t well explainable with behaviorism.
      
      You can get rid of a phobia via reinforcement learning but it takes a lot of time and gradual change. There are various published principles that are simply faster.
      
      Pigeons manage to beat humans at a monty hall problem: http://www.livescience.com/6150-pigeons-beat-humans-solving-monty-hall-problem.html The pigeons engage the problem with reinforcement learning which is in this case a good strategy. Human on the other hand don’t use that strategy and get different outcomes. To me that suggest a lot of high level human thought is not about reinforcement learning.
      
      Given our bigger brains we should be able to beat the pigeons or at least be as good as them when we would use the same strategy.
      - Kaj_Sotala 11 May 2015 8:06 UTC
        2 points
        Parent
        Oh, I definitely don’t think that human learning would only rely on RL, or that RL would be the One Grand Theory Explaining Everything About Learning. (Human learning is way too complicated for any such single theory.) I agree that e.g. the Anki card example you mention requires more blocks to explain than RL.
        
        That said, RL would help explain things like why many people’s efforts to study via Anki so easily fail, and why it’s important to make each card contain as little to recall as possible—the easier it is to recall the contents of a card, the better the effort/reward ratio, and the more likely that you’ll remain motivated to continue studying the cards.
        
        You also mention CBT. One of the basic building blocks of CBT is the ABC model, where an Activating Event is interpreted via a subconscious Belief, leading to an emotional Consequence. Where do those subconscious Beliefs come from? The full picture is quite complicated (see appraisal theory, the more theoretical and detailed version of the ABC model), but I would argue that at least some of the beliefs look like they could be produced by something like RL.
        
        As a simple example, someone once tried to rob me at a particular location, after which I started being afraid of taking the path leading through that location. The ABC model would describe this as saying that the Activating event is (the thought of) that location, the Belief is that that location is dangerous, and the Consequence of that belief is fear and a desire to avoid that location… or, almost equivalently, you could describe that as a RL process having once received a negative reward at that particular location, and therefore assigning a negative value to that location since that time.
        
        That said, I did reason that even though it had happened once, I’d just been unlucky on that time and I knew on other grounds that that location was just as safe as any other. So I forced myself to take that path anyway, and eventually the fear vanished. So you’re definitely right that we also have brain mechanisms that can sometimes override the judgments produced by the RL process. But I expect that even their behavior is strongly shaped by RL elements… e.g. if I had tried to make myself walk that path several times and failed on each time, I would soon have acquired the additional Belief that trying to overcome that fear is useless, and given up.
        [deleted] 12 May 2015 16:19 UTC
        6 points
        Parent
        I think it is very important to consider the difference between a descriptive model and a theory of a mechanism.
        
        So, inventing an extreme example for purposes of illustration, if someone builds a simple, two-parameter model of human marital relationships (perhaps centered on the idea of cost and benefits), that model might actually be made to work, to a degree. It could be used to do some pretty simple calculations about how many people divorce, at certain income levels, or with certain differences in income between partners in a marriage.
        
        But nobody pretends that the mechanism inside the descriptive model corresponds to an actual mechanism inside the heads of those married couples. Sure, there might be!, but there doesn’t have to be, and we are pretty sure there is no actual calculation inside a particular mechanism, that matches the calculation in the model. Rather, we believe that reality involves a much more complex mechanism that has that behavior as an emergent property.
        
        When RL is seen as a descriptive model—which I think is the correct way to view it in your above example, that is fine and good as far as it goes.
        
        The big trouble that I have been fighting is the apotheosis from descriptive model to theory of a mechanism. And since we are constructing mechanisms when we do AI, that is an especially huge danger that must be avoided.
        Kaj_Sotala 14 May 2015 21:26 UTC
        2 points
        Parent
        I agree that this is an important distinction, and that things that might naively seem like mechanisms are often actually closer to descriptive models.
        
        I’m not convinced that RL necessarily falls into the class of things that should be viewed mainly as descriptive models, however. For one, what’s possibly the most general-purpose AI developed so far seems to have been developed by explicitly having RL as an actual mechanism. That seems to me like a moderate data point towards RL being an actual useful mechanism and not just a description.
        
        Though I do admit that this isn’t necessarily that strong of a data point—after all, SHRDLU was once the most advanced system of its time too, yet basically all of its mechanisms turned out to be useless.
        [deleted] 15 May 2015 19:20 UTC
        0 points
        Parent
        Arrgghh! No. :-)
        
        The DeepMind Atari agent is the “most general-purposeAI developed so far”?
        
        !!!
        
        At this point your reply is “I am not joking. And don’t call me Shirley.”
        ChristianKl 11 May 2015 12:16 UTC
        0 points
        Parent
        
        So I forced myself to take that path anyway, and eventually the fear vanished.
        
        The fact that you don’t consciously notice fear doesn’t mean that it’s completely gone. It still might raise your pulse a bit. Physiological responses in general stay longer.
        
        To the extend that you removed the fear In that case I do agree doing exposure therapy is drive by RL. On the other hand it’s slow.
        
        I don’t think you need a belief to have a working Pavlonian trigger. When playing around with anchoring in NLP I don’t think that a physical anchor is well described as working via a belief. Beliefs seem to me separate entities. They usually exist as “language”/semantics.
        Kaj_Sotala 11 May 2015 13:17 UTC
        0 points
        Parent
        
        When playing around with anchoring in NLP I don’t think that a physical anchor is well described as working via a belief. Beliefs seem to me separate entities. They usually exist as “language”/semantics.
        
        I’m not familiar with NLP, so I can’t comment on this.
        ChristianKl 11 May 2015 23:17 UTC
        0 points
        Parent
        Do you have experience with other process oriented change work techniques? Be it alternative frameworks or CBT?
        
        I think it’s very hard to reason about concepts like beliefs. We have a naive understanding of what the word means but there are a bunch of interlinked mental modules that don’t really correspond to naive language. Unfortunately they are also not easy to study apart from each other.
        
        Having reference experiences of various corner cases seems to me to be required to get to grips with concepts.
        Kaj_Sotala 12 May 2015 15:51 UTC
        0 points
        Parent
        
        Do you have experience with other process oriented change work techniques?
        
        Not sure to what extent these count, but I’ve done various CFAR techniques, mindfulness meditation, and Non-Violent Communication (which I’ve noticed is useful not only for improving your communication, but also dissolving your own annoyances and frustrations even in private).
        ChristianKl 12 May 2015 16:07 UTC
        0 points
        Parent
        Do you think that resolving an emotion frustration via NVC is done via reinforcement learning?
        Kaj_Sotala 12 May 2015 18:11 UTC
        0 points
        Parent
        No.
      - Richard_Kennaway 11 May 2015 12:29 UTC
        1 point
        Parent
        
        The pigeons engage the problem with reinforcement learning
        
        How do you know? When a scientist rewards pigeons for learning, the fact that the pigeons learn doesn’t prove anything about how the pigeons are doing it.
        ChristianKl 11 May 2015 12:37 UTC
        0 points
        Parent
        Of course they are a black box and could in theory use a different method. On the other hand their choices are comparable with the ones that an RL algorithm would make while the ones of the humans are father apart.
        Richard_Kennaway 11 May 2015 12:47 UTC
        0 points
        Parent
        I agree with Richard Loosemore’s interpretation (but I am not familiar with the neuroscience he is referring to):
        
        First, virtually any circuit can be made to look like it has RL in it, if there is just a bit of feedback and some adaptation—so in that sense finding evidence for RL in some circuit is like saying “we found a bit of feedback and some adaptation”, which is a trivial result.
        
        ChristianKl 11 May 2015 12:54 UTC
        0 points
        Parent
        The main point that I wanted to make wasn’t about Pigeon intelligence but that the heuristics humans use differ from RL results and that in cases like this the Pigeons produce results that are similar to RL and therefore it’s not a problem of cognitive resources.
        
        The difference tells us something worthwhile about human reasoning.
      - AshwinV 11 May 2015 4:13 UTC
        0 points
        Parent
        Uhm. Is there any known experiment that has been tried which has failed with respect to RL?
        
        In the sense, has there been an experiment where one says RL should predict X, but X did not happen. The lack of such a conclusive experiment would be somewhat evidence in favor of RL. Provided of course that the lack of such an experiment is not due to other reasons such as inability to design a proper test (indicating a lack of understanding of the properties of RL) or lack of the experiment happening to due to real world impracticalities (not enough attention having been cast on RL, not enough funding for a proper experiment to have been conducted etc.)
        ChristianKl 11 May 2015 12:03 UTC
        0 points
        Parent
        In general scientists do a lot of experiments where they make predictions about learning and those predictions turn out to be false. That goes for predictions based on RL as well as prediction based on other models.
        
        Wikipedia describes RL as:
        
        Reinforcement learning is an area of machine learning inspired by behaviorist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward.
        
        Given that’s an area of machine learning you usually don’t find psychologists talking about RL. They talk about behaviorism. There are tons of papers published on behaviorism and after a while the cognitive revolution came along and most psychologists moved beyond RL.
        Kaj_Sotala 11 May 2015 13:35 UTC
        0 points
        Parent
        
        Given that’s an area of machine learning you usually don’t find psychologists talking about RL. They talk about behaviorism.
        
        Not quite true, especially not if you count neuroscientists as psychologists. There have been quite a few papers by psychologists and neuroscientists talking about reinforcement learning in the last few years alone.
        Wes_W 11 May 2015 4:34 UTC
        0 points
        Parent
        It appears to me that ChristianKI just listed four. Did you have something specific in mind?
        AshwinV 11 May 2015 5:09 UTC
        2 points
        Parent
        Uhm, I kind of felt the pigeon experiment was a little misleading.
        
        Yes, the pigeons did a great job of switching doors and learning through LR.
        
        Human RL however (seems to me) takes place in a more subtle manner. While the pigeons seemed to focus on a more object level prouctivity, human RL would seem to take up a more complicated route.
        
        But even that’s kind of besides the point.
        
        In the article that Kaj had posted above, with the Amy Sutherland trying the LRS on her husband, it was an interesting point to note that the RL was happening at a rather unconscious level. In the monty hall problem solving type of cognition, the brain is working at a much more conscious active level.
        
        So it seems more than likely to me that while LR works in humans, it gets easily over-ridden if you will by conscious deliberate action.
        
        One other point is also worth noting in my opinion.
        
        Human brains come with a lot more baggage than pigeon brains. Therefore, it is more than likely than humans have learnt not to switch through years of re-enforced learning. It makes it much harder to unlearn the same thing in a smaller period of time. The pigeons having lesser cognitive load may have a lot less to unlearn and may have made it easier for them to learn the switching pattern.
        AshwinV 11 May 2015 5:15 UTC
        0 points
        Parent
        Also, I just realised that I didn’t quite answer your question. Sorry about that I got carried away in my argument.
        
        But the answer is no, I don’t have anything specific in mind. Also, I don’t know enough about things like what effects RL will have on memory, preferences etc. But I kind of feel that I could design an experiment if I knew more about it.