Book Review: Cognitive Science (MIRI course list)

I’m reviewing the books on the MIRI course list. After reading Heuristics and Biases I picked up Cognitive Science, by José Luis Bermúdez. It taught me a number of interesting things, but I strongly disliked it.

Cognitive Science

I recommend against reading this book, for reasons that I’ll go into at the end of the review. Before that, I’ll summarize each chapter.

Cognitive Science (cover)

Chapter Summaries

  1. The prehistory of cognitive science

  2. The discipline matures: three milestones

  3. The turn to the brain

  4. Cognitive science and the integration challenge

  5. Tackling the integration challenge

  6. Physical symbol systems and the language of thought

  7. Applying the symbolic paradigm

  8. Neural networks and distributed information processing

  9. Neural network models of cognitive processes

  10. How are cognitive systems organized?

  11. Strategies for mapping the brain

  12. A case study: Exploring mind reading

  13. New horizons: Dynamical systems and situated cognition

  14. Looking ahead: Challenges and Applications

1. The prehistory of cognitive science

This chapter introduces the idea of cognition as a form of information processing, and discusses how that viewpoint came about. Short version:

  1. Behaviorism isn’t enough to explain intelligent behavior.

    • Mice allowed to wander in mazes with no goal were able to navigate the maze quickly when goals were added. Reinforcement on its own does not sufficiently explain learning.

  2. Church & Turing put forth the Church-Turing thesis (anything computable is computable by a Turing machine).

  3. Studying language syntax revealed deep structure and rules.

The chapter illustrates how these insights led to the idea of cognition as information processing.

This chapter is informative from a historical perspective. It might be worth reading.


2. The discipline matures: three milestones

This chapter touches upon three early cognitive science “milestones”: A computer program, an experiment, and a cognitive model.

  1. SHRDLU was a program that uses natural language in a very limited setting.

  2. The study measured the time it takes to, given two images, determine whether the second depicts the same object as the first (viewed from a different angle).

    • It turns out the amount of time it takes is a linear function of the amount of rotation between views of the object.

    • Because each image contains the same amount of visual data (measured in pixels or bits), this shows that cognitive processing must be tiered.

  3. This led to a “hierarchical” model of cognitive science.

    • One layer makes representations out of images, another layer rotates them and pattern matches.

This chapter is superficial; I wouldn’t recommend it to the LessWrong audience. The study in (2) is interesting, but now that you know about it I’d recommend reading the study directly.


3. The turn to the brain

Eventually people developed the tech to actually watch the brain work, which had an impact on cognitive models.

We have high-level imaging techniques and low-level neuron monitoring. That latter is only done on non-humans.

Scientists studying the brain found discrete cognitive modules. Surgery on monkey brains can reliably induce specific deficiencies. We can inhibit object recognition (the “what”) and spacial recognition (the “where”) separately, by cutting out specific parts of the brain.

The same studies showed that information can take multiple routes: information from the visual cortex flows along a ventral stream (visual cortex → object recognizer) and a dorsal stream (visual cortex → spacial locator).

Clever experiments with humans allow us to see what parts of the brain are activated by certain activities. Techniques include tracking irradiated fluids (PET) and tracking blood flow (MRI). By having patients do similar tasks with small modifications, we can get brain activity data. By aggregating data and subtracting data from control experiments, we can identify brain regions associated with various activities.

The chapter concludes with a brief discussion of the discovery of neurons.

This chapter is superficial; you can get the same info from the “brain” section of an introductory biology course.


4. Cognitive science and the integration challenge

Cognitive Science spans everything from molecular biology to deep philosophy. Unfortunately, we don’t know how all the parts connect yet. This is referred to as the “Integration Challenge”.

The following is presented as an example:

You have four cards. Flip whichever cards you must to verify the rule If a card has a vowel on one side then it has an even number on the other:

E C 4 5

People do notoriously badly at this game. It’s called the “Wason selection task”. It was mentioned in the sequences a few times. But it turns out, people are much better at this version:

There are four people drinking at a bar. A police officer busts in and needs to verify the rule If a person is drinking alcohol, they must be at least 21. Which of the following must they investigate further?

Beer-drinker, Coke-drinker, 25-year-old, 16-year-old

These problems are logically identical. However, most people suggest flipping 4 while few people suggest checking what the 25 year old is drinking.

More generally, it seems that people can do very well on the Wason selection task if it’s framed in such a way that people are looking for cheaters. (Eliminating the police officer from the above story is sufficient to reduce performance.)

This lends credence to the idea of separate “mind modules” that can be activated (such as a cheater-detection module). The fields of cognitive bias, evolutionary biology, and economic game theory all have a part in explaining this phenomenon.

It seems that many fields overlap in the cognitive realm. The fact that much research is siloed in separate fields is framed as the “integration challenge”.

The chapter is somewhat interesting, but is primarily superficial and wastes a few pages explaining basic concepts such as the Prisoner’s Dilemma. I’d avoid it.


5. Tackling the integration challenge

This chapter attempts to address the “integration challenge” posed above.

The chapter opens with a brief (uninspired) introduction to reductionism and some wishy-washy reasons why it doesn’t apply to cognitive science. This basically boils down to the following:

Cognitive science ‘laws’ are merely statistical. Cognitive science ‘equations’ (such as perceived intensity of stimulus, Ψ=kϕ^n) are descriptive, not explanatory. Therefore, you can’t reduce cognitive science to physics.

Another alternative is explored, the “Tri-level hypothesis”, which basically posits that you can look at the brain as a collection of information processing systems, each of which should be viewed on three levels:

  1. The transformation level (what’s the input, what’s the output, what’s the transformation?)

  2. The algorithmic level (how does the transformation work?)

  3. The mechanical level (how is it implemented in the real world?)

This is also rejected, on the basis that it Just Doesn’t Seem Like Intelligence. (There’s no room for executive control, its not clear how learning would occur, the modules seem like they would be too independent.)

I recommend avoiding this chapter. It is a confused and poor introduction to reductionism that makes many flawed arguments.


6. Physical symbol systems and the language of thought

This chapter opens with a proposal from the 1970s called the “Physical Symbol System Hypothesis”:

A physical symbol system has the necessary and sufficient means for general intelligent action.

It explores this hypothesis a bit. It make analogies to logic (a symbol game that can be interpreted with meaning), gives some examples of search algorithms (that operate by symbol manipulation), and explores how language is a type of symbol manipulation.

It proceeds to counter this idea with the Chinese Room argument and the Symbol Grounding Problem (how do symbols become meaningful?).

I recommend avoiding this chapter. The first half of the chapter is slow and redundant to anyone moderately familiar with logic/​computation. The latter half falls into a number of traps that the Sequences are specifically designed to disarm. Members of the LessWrong community are likely to find it frustrating.


7. Applying the symbolic paradigm

This chapter explores decision algorithms and machine learning. It describes an algorithm designed to build a decision tree from a database (by analyzing which questions have the most expected information).

The algorithm was somewhat frustrating, because simple probabilistic reasoning could have improved the results significantly. (And because the algorithm was presented as a brilliant real-world example and not as a simple toy.)

The later part of the chapter discussed some robots that do symbol manipulation to achieve their goals (such as SHAKEY).

I recommend avoiding this chapter. It’s painfully slow for someone already familiar with computation and the basics of information theory.

(If you don’t know about computation and the basics of information theory, I suggest learning them elsewhere.)


8. Neural networks and distributed information processing

This chapter begins with a brief introduction to how a neuron operates and describes artificial neural networks.

It wasted some time explaining boolean functions, and then introduced the “perceptron convergence rule”, a learning algorithm that causes any single-layer neural net to converge on the optimal answer (assuming the optimal answer can be encoded in a single-layer neural net).

Some examples are given before showing that a single-layer neural net cannot implement XOR. This is followed by a claim that single-layer neural nets can only implement linearly separable functions.

Multi-layer neural nets can implement any function (in principle), but no equivalent convergence rule exists in multi-layer nets. Alternative learning mechanisms, such as backpropagation of error, are explored. However, backpropagation is non-local and biologically implausible. (There is some evidence against backpropagation of error in brains.)

The remainder of the chapter explores the difference between the neural-net brain architectures and physical symbol processors. Major differences include:

  • There is no clear distinction between information storage and information processing in a neural net

  • There is no distinction between representation and rules in a neural net.

This is contrasted with Turing machine, in which the symbols and rules are quite distinct.

This chapter has some signal in it, but it’s mostly lost in the noise of superficial overviews (at the beginning) and a false dichotomy (at the end).


9. Neural network models of cognitive processes

This chapter discusses learning. Those who model the brain as a symbol-manipulation engine view learning very differently from those who model the brain as a neural net.

From the symbol-manipulation side, the following argument is presented:

In order to learn English, you must learn the truth conditions of sentences. These truth conditions must be expressed in a lower level language. That lower level language is the language of thought, which must be innate.

The neural-net side offers an alternative:that learning can proceed by strengthening or weakening connections between neurons. Some experimental evidence supports these claims:

When children are learning to speak, they go through a few phases with irregular verbs (give/​gave). Early on, children conjugate correctly (gave). Later, they regress and start conjugating irregular verbs like normal verbs (gived). Later, they regain their original skill.

A few studies of neural nets showed that verb-conjugating neural nets exhibit a similar pattern, if they’re trained on irregular verbs for a little while before increasing their vocabulary.

Other examples of humans-learning-like-neural-nets are presented, and are interesting.

The latter half of the chapter cites various studies concerned with infant learning. Some proponents of the physical symbol hypothesis have claimed that a baby’s world must be a chaos of sensory overload (as the baby cannot yet parse the world into representations). However, studies show that this is not the case.

It turns out that babies look longer at things which surprise them. This can be used to measure what babies find surprising. Babies scrutinize physics-defying videos longer than physics-obeying videos, implying that a certain amount of world knowledge is “baked in”.

The chapter concludes by acknowledging that neural nets and physical symbol systems are not mutually exclusive, and that the latter may be implemented by the former. However, neural nets show that there is a non-symbolic way to process information, thereby discounting the physical symbol system hypothesis as “necessary and sufficient”. We should broaden our view to include more than just the physical symbol hypothesis in our model of cognition.

The points made in this chapter are interesting and were mostly new to me. If any of the above was surprising, this chapter is likely worth your time.


10. How are cognitive systems organized?

This chapter explores agent architectures.

One architecture, reflex agents, can’t be termed intelligent. The input is directly linked to the action system via a small set of rules.

Another architecture, goal-based agents, have a model of the world and a set of goals, and take actions to maximize their goals. They have no mechanism for learning.

The third architecture, learning agents, have a memory and some learning mechanism.

Such architectures are naturally described in a modular manner. (Input systems, output systems, memory systems, prediction systems, goal-evaluations systems, etc.). Furthermore, the brain studies from earlier imply some degree of modularity.

This gives rise to the modularity hypothesis, which claims that the brain is modular.

A “massive modularity” hypothesis is put forth, claiming that the brain is entirely made of domain-specific modular components.

The arguments in favor of modularity include:

Evolution baked many behaviors. For example, we have some degree of willingness to sacrifice ourselves for family. This isn’t because we run the numbers and discover that our genes propagate better that way. Rather, it’s because the self-propagation gene actually did propagate further. We have “Darwinian modules” that have been baked in for fitness reasons. (Readers may notice echoes of The Sequences here.)

Given the wide array of domain-specific and unrelated adaptations that evolution must provide (cheater-detection, facial-recognition, emotion-extrapolation, folk-physics, etc.) it makes sense to consider these modules separately.

In other words, because evolution is not coordinated we should expect specific evolutionary advantages to be realized as domain-specific, encapsulated cognitive modules.

In its strongest form, the massive modularity hypothesis states that there are no domain-general fitness criteria, so evolution cannot create domain-general cognitive mechanisms. They claim the brain has no domain-general central processing, but rather is a collection of domain-specific modules.

Note: This argument has as many holes in the book as it has in my summary.

Counter-arguments are then presented illustrating why the strong massive modularity hypothesis is silly:

  • “Darwinian modules” are not applied in a mandatory fashion: people seem capable of overriding the self-sacrifice module.

  • Modules take a limited range of inputs. How are those inputs selected? Inputs to a cheater-detection-module must be representations of social exchanges. There must be a filter that filters some larger data set down to just the social exchanges. What determines the input for that filtering module? Continue up the stack until you have something that is operating on very wide inputs. This is domain general.

  • How could domain-general learning be possible with only domain-specific modules?

Or, in other words, “humans seem pretty domain-general to me”.

The book concludes that while there may be many modules, the brain is not only domain-specific modules.

The remainder of the chapter studies a “hybrid architecture” called ACT-R/​PM. The example struck me as a stretch to prove a point, and did not seem relevant.

I’d avoid this chapter. It wasted a lot of time presenting a false dichotomy between modularity and “having some sort of central executor” by presenting a straw man argument. It didn’t provide new insights.


11. Strategies for mapping the brain

This chapter returns to brain-scanning tools. It gives an overview of brain areas. It briefly mentions mirror neurons (neurons that fire both when you do something and when you watch somebody else do it). It discusses many techniques for looking at brains. Some are invasive, single-neuron, and only done on non-humans. Others are high-level and observe things like blood flow, which may or may not be a good indicator of brain activity. We lack tools to study the middle-ground, regions of neurons and their connections. The chapter concluded with a discussion of the potential pitfalls when drawing conclusions from incomplete data.

This chapter was a rehash of chapter three. I found it useless.


12. A case study: Exploring mind reading

This chapter explored the mechanisms of empathy.

It first explored the ability to play “make believe” as an early form of metarepresentation (representing a representation), a concept assumed to be necessary before one can disconnect representations from what they represent. For example, a metarepresentation allows you to go from “The world is in state X” to “Susan believes the world is in state X” which requires you to represent the world-representation “state X” as the subject of Susan’s belief.

(In other words, playing make-believe is seen as a crucial step in developing a model of beliefs.)

This is presumed crucial in developing a theory of mind. (Indeed, autistic children tend not to play “make believe” and have difficulty passing false belief tests, which require a working theory of mind.)

The false belief test is introduced:

Jane and Sally see a marble placed in the basked. Jane leaves the room. Sally sees the marble moved to the box. Sally is asked where Jane will look for the marble.

A “theory of mind” brain-module is postulated. It takes a bit of heat when studies show that, even after children pass the false belief test, true beliefs are easier to model than false beliefs.

This is followed by a completely different theory claiming that metarepresentation is not required to pass the false belief test. This model claims that a child can represent a relationship between Sally and false world-states without ever representing a representation.

Analogies are drawn to counterfactual thinking: you can think about how you could be eating a different sandwich, without thinking about how you think about sandwiches. Similarly, you can represent someone having false beliefs without representing beliefs.

The book moves to the simulation model of empathy, which claims that you empathise with people by putting yourself in their shoes, pretending you believe what they believe, and assessing how you would feel.

This argument is supported by studies of people with specific brain damage. Turns out, people who can’t feel fear also have trouble identifying fear (but not disgust) in others, and people who can’t feel disgust have trouble identifying disgust (but not fear) in others.

There is some evidence showing that certain brain regions are active only when attributing false beliefs to someone. Different people interpret this evidence in different ways.

This chapter had a lot of signal. The debate has much more depth than I’ve covered here. This chapter is worth your time.


13. New horizons: Dynamical systems and situated cognition

Some people argue that we’re doing cognitive science wrong. When we want to govern a steam engine, we don’t design a computer that samples the engine speed and adjusts the throttle: rather, we attach it to a flywheel that raises weights as the engine speeds up. Then we just let gravity sort the damn thing out.

Similarly, crickets run towards other cricket-noises: but it turns out they aren’t processing sounds and deciding that cricket noises are sexy. Rather, their ears (which are in their legs (which are hollow)) are directly hooked to the motor output. When the ears are properly stimulated the legs move towards the sound. No brain necessary.

These people claim that we should look at humans on a dynamical level instead of postulating all of these high-falutin’ modules.

Some studies with infants show that they act dynamically. If you put the toy in box A a few times (training them to reach for A), then put it in box B, then they’ll reach for box B.

Unless you restrict them for a few seconds. Then they’ll go for box A. Unless you stand them up, in which case they go for box B again. Environment, time, and muscle position all seem to affect infant activity. This implies that non-cognitive data is necessary to assess infant cognition.

Detractors argue that while dynamics leads to good data, dynamics does not lead to understanding. Modelling traffic as a multi-particle system allows you to predict traffic jams better but it hardly tells you why the model works. For that, you need some model of human intent and perhaps a bit of game theory.

Also, subjectively, brains don’t feel completely dynamical.

The remainder of the chapter is spent exploring dynamical robots. They’re neat.

This chapter provides an interesting new way to look at cognition. It is worth a read.


14. Looking ahead: Challenges and Applications

This is very short. It covers some promising areas of research. They are:

  1. The Human Connectome Project (studying neural connections in the brain).

  2. Studying what the brain is doing when it’s idle (instead of doing experiments where we just “subtract” the control).

  3. Neuroprostheses (substitute brain modules).

  4. Improved education (leveraging what we’ve learned about cognition).

  5. Crossovers from cognitive science to economics (how do Homo Sapiens differ from Homo Economicus, and why?) and law (who is responsible for what?).

  6. Cognitive scientists have long been careful to avoid consciousness. Now people are starting to approach the problem. Philosophical zombies are presented seriously.

If you’re interested in any of the above points, I recommend learning about them elsewhere. This chapter is short and does not have much data. I recommend skipping it.

Reactions

This book bored and frustrated me, for a number of reasons.

  1. Assumed a low level of technical competence from its readers: Many basic concepts were introduced superficially but at length. Examples include the structure of neurons and the prisoner’s dilemma. I felt talked down to throughout much of the book. (One of the exercises was “Give a truth table for OR.”)

  2. Gave serious credence to silly ideas: The Chinese Room argument and Philosophical Zombies were given serious credence in this text. The fact that cognitive science is very far from reduction to physics was used as an excuse to write off reductionism entirely. Such floundering is somewhat expected in any text that touches mainstream philosophy and strives for “political correctness”, but I expected more from the MIRI course list.

  3. Lacked technical arguments: The book had a lot of words and not a lot of data. Much of the book was in “he-said she-said” format, explaining debates between cognitive scientists. I understand that the field of cognitive science is not mature enough to make many technical arguments, but even the debates were highly summarized. Worse, they were full of incomplete or confused arguments. I would have been much happier if the author presented the data and stepped aside.

  4. The author was incapable of drawing conclusions: The author presented many arguments, argued one side, and then concluded with “or, at least, that’s what so-and-so says.” Some of the positions were very poorly argued, and it was difficult for me to tell whether this was due to the author’s misunderstanding or whether the author was faithfully relaying confused arguments.

All in all, the book was a lot of noise with very little signal. I was expecting much more, especially given its prominent position as the first book on the MIRI course recommendations. It felt like a superficial introduction to low-level concepts. It’s geared towards high school students or freshmen undergrads who do not yet know how to think critically.

I was expecting a rationalist introduction to cognitive science: “Here’s what we think we know, here’s how we know it. Here’s the gray areas, here’s what professionals say.” I feel you could boil away 80% of this book without loss.

Note: I got this book used. The previous owner had no idea how to effectively use a highlighter. Many unimportant phrases were highlighted. I found it difficult not to read them LOUDLY. This increased my frustration. I have tried to adjust accordingly, but it may be biasing my review.

Note: I assume that the concepts of brain modularity, neural nets, information-as-entropy, and symbol-processing are familiar to MIRI’s target audience. If that is not the case then this book provides a passable sketch for how to start thinking about minds. Even so, I would not recommend it for that purpose on account of point (2) above.

What I learned

The book was not all bad. There was some signal in the noise. But first, a rebuttal:

Cognitive Science seemed sorely confused about one thing in particular, that being the operation of neural nets. The neural net vs physical symbol debate seemed a false dichotomy.

A neural net implementing an AND gate is manipulating symbols, if you consider it on a sufficiently abstract level.

A cricket’s legs are doing calculations to identify cricket-noises and move in that direction. The algorithm was designed by Time and Evolution, and it’s run on Physics directly (instead of on the cricket’s brain), but the computation still occurs.

A neural net does process symbols according to rules. It doesn’t just come to magic answers because it has magic weights: A well-trained neural net works because the propagation of activation through the net mimics the causal structure of the real world.

The author missed or neglected these points entirely.

That said, this book did teach me something about symbol-processing and neural nets. It taught me that the symbols in a neural net are much looser than the symbols you’ll find in modern computers. In a “strict” symbol manipulation system, representations are binary: either all the symbols are in place and things work, or one symbol is out of place and everything breaks.

Neural nets are significantly more versatile: a neural net can be strong enough to recognize an answer without being strong enough to produce the answer. With good training, a neural net can slowly build a good representation out of random starting weights.

In essence, Cognitive Science showed me that symbols needn’t be discrete: they can be continuous, convoluted, and blurry.

(Within brains, they often are all three.)

This seems obvious in retrospect, but “fuzzy symbols” were a novel concept to me.

Here’s a few other tidbits that I took away:

  • The perceptron convergence rule (and its inapplicability to multi-layer neural nets) was new to me.

  • Babies do not live in a land of sensory chaos. I was once taught otherwise. Cognitive Science dispelled a false belief with interesting data, and for that I am thankful.

  • The discussion of how evolution can create modules in brains was an interesting one, and gave me a new way of looking at domain-specific brain functions.

  • I was quite surprised by the Wason Selection Task results (where framing the problem in a way that puts you on the lookout for cheaters improves performance).

Finally, this book gives you a great overview of the history of Cognitive Science. It’s easy to believe that the modern cognitive model is obvious if that’s what you learned first. It’s illustrative to see how difficult it was to build that model up and what alternatives were considered along the way.

    I’m sure I picked up a few other things that have been lost to hindsight bias.

    What should I read?

    I recommend avoiding this book. If any of the above subjects interest you, I recommend finding other sources that focus on those subjects in more depth.

    In fact, I recommend finding a better Cognitive Science book for the MIRI course list: the university course using this book might well be good, but I expect the book to frustrate the type of people who tackle the book list directly. It surely frustrated me.

    A book with more data and more technical arguments, which assumes a high level of competence in its readers, would be a marked improvement.