Introduction to Connectionist Modelling of Cognitive Processes: a chapter by chapter review

This chapter by chapter review was inspired by Vaniver’s recent chapter by chapter review of Causality. Like with that review, the intention is not so much to summarize but to help readers determine whether or not they should read the book. Reading the review is in no way a substitute for reading the book.

I first read Introduction to Connectionist Modelling of Cognitive Processes (ICMCP) as part of an undergraduate course on cognitive modelling. We were assigned one half of the book to read: I ended up reading every page. Recently I felt like I should read it again, so I bought a used copy off Amazon. That was money well spent: the book was just as good as I remembered.

By their nature, artificial neural networks (referred to as connectionist networks in the book) are a very mathy topic, and it would be easy to write a textbook that was nothing but formulas and very hard to understand. And while ICMCP also spends a lot of time talking about the math behind the various kinds of neural nets, it does its best to explain things as intuitively as possible, sticking to elementary mathematics and elaborating on the reasons of why the equations are what they are. At this, it succeeds – it can be easily understood by someone knowing only high school math. I haven’t personally studied ANNs at a more advanced level, but I would imagine that anybody who intended to do so would greatly benefit from the strong conceptual and historical understanding ICMCP provided.

The book also comes with a floppy disk containing a tlearn simulator which can be used to run various exercises given in the book. I haven’t tried using this program, so I won’t comment on it, nor on the exercises.

The book has 15 chapters, and it is divided into two sections: principles and applications.

Principles

1: ”The basics of connectionist information processing” provides a general overview of how ANNs work. The chapter begins by providing a verbal summary of five assumptions of connectionist modelling: that 1) neurons integrate information, 2) neurons pass information about the level of their input, 3) brain structure is layered, 4) the influence of one neuron on another depends on the strength of the connection between them, and 5) learning is achieved by changing the strengths of connections between neurons. After this verbal introduction, the basic symbols and equations relating to ANNs are introduced simultaneously with an explanation of how the ”neurons” in an ANN model work.

2: ”The attraction of parallel distributed processing for modelling cognition” explains why we’re supposed to be interested in these kinds of models in the first place. It elaborates on some interesting characteristics of ANNs: the representation of knowledge is distributed over the whole network, they are damage resistant and fault tolerant, and they allow memory access by content. All of these properties show up in the human brain, but not in classical computer programs. After briefly explaining these properties, there is an extended example of an ANN-based distributed database storing information about various gang members. In addition to being content addressable, it also shows typicality effects – it can be asked a question like ”what are the members of the gang ‘Sharks’ like”, and it will naturally retrieve information about their typical ages, occupations, educational backgrounds, and so forth. Likewise, if asked to return the name of a pusher, it will suggest the name of the most typical pusher. In addition to explaining what the model is like, this chapter also explains the reasons for why it works the way it does.

3: ”Pattern association” describes a specific kind of an ANN, a pattern associator, and a particular kind of learning rule, the Hebb rule. Pattern associators are networks which are presented with a certain kind of pattern as input and a certain kind of pattern as output, after which they will learn to transform the input pattern to the output pattern. They are capable of generalization: if they encounter an input which is similar to ones they have encountered before, they will produce a similar output. They are also fault tolerant, in that they can produce good results even if parts of the network are destroyed. They also automatically perform prototype extraction. Suppose that there is a prototypical ”average” apple, and all other apples are noisy versions of the prototype. Pattern associators presented with several different patterns representing apples will learn to react the most strongly to an apple which is closest to the prototype, even if they have never actually seen the prototype itself.

4: ”Autoassociation” deals with autoassociator networks, and explains how the Delta learning rule works. Autoassociators are a special case of pattern associators – they are taught to reproduce the same pattern at output that was present at input. While this may seem pointless at first, autoassociators are an effective way of implementing a kind of memory: once trained, they can reproduce a complex pattern merely from seeing a small fragment of the original pattern. This has an obvious connection to the human brain, which can e.g. recall a complicated past memory from simply picking up a smell that formed a minor part of the original memory. Autoassociators are also capable of forming categories and prototypes from individual experiences, such as forming a category corresponding to the concept of a dog from seeing several dogs, without explicitly being told that they all belong to the same category. (Or, to put it in Less Wrong jargon, they learn to recognize clusters in thingspace.)

5: ”Training a multi-layer network with an error signal: hidden units and backpropagation” deals with the limitations of single-layered networks and how those limitations can be overcome by using more complex networks that require new kinds of training rules.

6: ”Competitive networks” differ from previous networks in that they can carry out unsupervised learning: while the previous nets had an explicit teacher signal, competitive networks learn to categorize input patterns into related sets on their own. They can perform both categorization, transforming related inputs into more similar outputs, and orthogonalization, transforming similar inputs into less similar outputs.

7: ”Recurrent networks” are capable of doing more than just simple transformations: they have feedback loops and more complicated internal states than non-recurrent networks can. This can be used to create sequences of actions, or to do things like predicting the next letter in a string of words and to identify word boundaries.

Applications

8: ”Reading aloud” can be difficult, especially in a highly irregular language like English, where most rules of how to transform a spelling to sounds have frequent exceptions. A child has to try to discover the regularities in an environment where there are both regularities and many exceptions. This chapter first briefly discusses traditional ”2-route” models of reading aloud, which presume that the brain has one route that uses pronounciation rules to read aloud regular words, and another route which memorizes specific knowledge about the pronounciation of exception words. These are then contrasted with connectionist models, in which there is no distinction between specific information and general rules. ”There is only one kind of knowledge – the weights of the connections which the model has acquired as a result of its experiences during training – and this is all stored in a common network.” The chapter then discusses several connectionist models which are successful in reading words aloud correctly, and which produce novel predictions and close matches to experimental psychological data.

9: ”Language acquisition” ”examines three aspects of language learning by children – learning the past tense, the sudden growth in vocabulary which occurs towards the end of the second year, and the acquisition of syntactic rules”. It describes and discusses various connectionist models which reproduce various pecularities of children’s language learning. For example, some young children initially correctly learn to produce the past tense of the word ”go” as ”went”, then later on overgeneralize and treat it as a regular verb, saying ”goed”, until they finally re-learn the correct form of ”went”. As in the previous chapter, models are discussed which are capable of reproducing this and other pecularities, as well as providing novel predictions of human language learning, at least some of which were later confirmed in psychological studies. The various reasons for such peculiarities are also discussed.

This chapter discusses many interesting issues, among others the fact that vocabulary spurts – dramatic increases in a child’s vocabulary that commonly happen around the end of age two – have been taken as evidence of the emergence of a new kind of cognitive mechanism. Experiments with connectionist models show that this isn’t necessarily the case – vocabulary spurts can also be produced without new mechanisms, as learning in an old mechanism reaches a threshold level which allows it to integrate information from different sources better than before.

10: ”Connectionism and cognitive development” elaborates on the issue of new mechanisms, discussing the fact that children’s learning appears to advance in stages. Traditionally, such qualitative changes in behavior have been presumed to be due to qualitative changes in the brain’s architecture. This chapter discusses connectionist models simulating the apperance of object permanence – the realization that objects continue to exist even when you don’t see them – and the balance beam problem, in which children are asked to judge the direction in which a balance beam loaded with various weights will tilt. It is shown that as the models are trained, they undergo stage-like change like a child, even though their basic architecture remains constant and the same training rule is used.

11: ”Connectionist neuropsychology – lesioning networks” shows that selectively damaging trained networks can closely mimic the performance of various brain damaged patients. Models of damaged performance are examined in the fields of reading, semantic memory, and attention.

12: ”Mental representation: rules, symbols and connectionist networks” discusses and counters claims of connectionist networks never being able to learn some kinds of rules which require one to use rules or symbols, due to having no explicit representation of them.

13: ”Network models of brain function” discusses two models which attempt to replicate brain functionality and experimental data about the brain: a model of the hippocampus, and a model of the visual cortex. Both are shown to be effective. The hippocampus model in good in storing and recalling patterns of data. The visual cortex model, on the other hand, succeeds in identifying faces regardless of their position in the visual field, and regardless of whether the face is seen from the front or from the side.

14: ”Evolutionary connectionism” has a brief discussion of connectionist networks that are trained using evolutionary algorithms. 15: ”A selective history of connectionism before 1986” is pretty much what it sounds like.

-----------

In addition to providing a nice analysis of various models originally published in journal papers, the book also provides a bit of a historical perspective. Several of the chapters start with analyzing an early model that produced promising but imperfect results, and then move on to a later model which built on the previous work and developed more realistic results. References are provided to all the original papers, and readers who are interested in learning more about the various topics discussed in the book are frequently referred to sources which discuss them in more depth.

I would recommend this book to anyone who has an interest in psychology, can handle a bit of math, and who isn’t yet familar with all the various and fascinating things that connectionist models are capable of doing. Although the models discussed in the book are all very simple as compared to the ones in the brain, they do make the idea of the brain being successfully reverse-engineered during this century feel a lot more plausible.