Sequence introduction: non-agent and multiagent models of mind
A typical paradigm by which people tend to think of themselves and others is as consequentialist agents: entities who can be usefully modeled as having beliefs and goals, who are then acting according to their beliefs to achieve their goals.
This is often a useful model, but it doesn’t quite capture reality. It’s a bit of a fake framework. Or in computer science terms, you might call it a leaky abstraction.
An abstraction in the computer science sense is a simplification which tries to hide the underlying details of a thing, letting you think in terms of the simplification rather than the details. To the extent that the abstraction actually succeeds in hiding the details, this makes things a lot simpler. But sometimes the abstraction inevitably leaks, as the simplification fails to predict some of the actual behavior that emerges from the details; in that situation you need to actually know the underlying details, and be able to think in terms of them.
Agent-ness being a leaky abstraction is not exactly a novel concept for Less Wrong; it has been touched upon several times, such as in Scott Alexander’s Blue-Minimizing Robot Sequence. At the same time, I do not think that it has been quite fully internalized yet, and that many foundational posts on LW go wrong due to being premised on the assumption of humans being agents. In fact, I would go as far as to claim that this is the biggest flaw of the original Sequences: they were attempting to explain many failures of rationality as being due to cognitive biases, when in retrospect it looks like understanding cognitive biases doesn’t actually make you substantially more effective. But if you are implicitly modeling humans as goal-directed agents, then cognitive biases is the most natural place for irrationality to emerge from, so it makes sense to focus the most on there.
Just knowing that an abstraction leaks isn’t enough to improve your thinking, however. To do better, you need to know about the actual underlying details to get a better model. In this sequence, I will aim to elaborate on various tools for thinking about minds which look at humans in more granular detail than the classical agent model does. Hopefully, this will help us better get past the old paradigm.
My model of what I think our subagents looks like draws upon a number of different sources, including neuroscience, psychotherapy and meditation, so in the process of sketching out my model I will be covering a number of them in turn. To give you a rough idea of what I’m trying to do, here’s a summary of some upcoming content.
Published posts:
(Note: this list may not always be fully up to date; see the sequence index for actively maintained version)
Book summary: Consciousness and the Brain. One of the fundamental building blocks of much of consciousness research, is that of Global Workspace Theory (GWT). This could be described as a component of a multiagent model, focusing on the way in which different agents exchange information between one another. One elaboration of GWT, which focuses on how it might be implemented in the brain, is the Global Neuronal Workspace (GNW) model in neuroscience. Consciousness in the Brain is a 2014 book that summarizes some of the research and basic ideas behind GNW, so summarizing the main content of that book looks like a good place to start our discussion and for getting a neuroscientific grounding before we get more speculative.
Building up to an IFS model. One theoretical approach for modeling humans as being composed of interacting parts is that of Internal Family Systems. In my experience and that of several other people in the rationalist community, it’s very effective for this purpose. However, having its origins in therapy, its theoretical model may seem rather unscientific and woo-y. This personally put me off the theory for a long time, as I thought that it sounded fake, and gave me a strong sense of “my mind isn’t split into parts like that”.
In this post, I construct a mechanistic sketch of how a mind might work, drawing on the kinds of mechanisms that have already been demonstrated in contemporary machine learning, and then end up with a model that pretty closely resembles the IFS one.
Subagents, introspective awareness, and blending. In this post, I extend the model of mind that I’ve been building up in previous posts to explain some things about change blindness, not knowing whether you are conscious, forgetting most of your thoughts, and mistaking your thoughts and emotions as objective facts, while also connecting it with the theory in the meditation book The Mind Illuminated.
Subagents, akrasia, and coherence in humans. We can roughly describe coherence as the property that, if you become aware that there exists a more optimal strategy for achieving your goals than the one that you are currently executing, then you will switch to that better strategy. For a subagent theory of mind, we would like to have some explanation of when exactly the subagents manage to be collectively coherent (that is, change their behavior to some better one), and what are the situations in which they fail to do so.
My conclusion is that we are capable of changing our behaviors on occasions when the mind-system as a whole puts sufficiently high probability on the new behavior being better, when the new behavior is not being blocked by a particular highly weighted subagent (such as an IFS-style protector) that puts high probability on it being bad, and when we have enough slack in our lives for any new behaviors to be evaluated in the first place. Akrasia is subagent disagreement about what to do.
Integrating disagreeing subagents. In the previous post, I suggested that akrasia involves subagent disagreement—or in other words, different parts of the brain having differing ideas on what the best course of action is. The existence of such conflicts raises the question, how does one resolve them?
In this post I discuss various techniques which could be interpreted as ways of resolving subagents disagreements, as well as some of the reasons for why this doesn’t always happen.
Subagents, neural Turing machines, thought selection, and blindspots. In my summary of Consciousness and the Brain, I briefly mentioned that one of the functions of consciousness is to carry out artificial serial operations; or in other words, implement a production system (equivalent to a Turing machine) in the brain.
While I did not go into very much detail about this model in the post, I’ve used it in later articles. For instance, in Building up to an Internal Family Systems model, I used a toy model where different subagents cast votes to modify the contents of consciousness. One may conceptualize this as equivalent to the production system model, where different subagents implement different production rules which compete to modify the contents of consciousness.
In this post, I flesh out the model a bit more, as well as applying it to a few other examples, such as emotion suppression, internal conflict, and blind spots.
Subagents, trauma, and rationality. This post interprets the appearance of subagents as emerging from unintegrated memory networks, and argues that the presence of these is a matter of degree. There’s a continuous progression of fragmented (dissociated) memory networks giving arise to increasingly worse symptoms as the degree of fragmentation grows. The continuum goes from everyday procrastination and akrasia on the “normal” end, to disrupted and dysfunctional beliefs on the middle, and conditions like clinical PTSD, borderline personality disorder, and dissociative identity disorder on the severely traumatized end.
I also argue that emotional work and exploring one’s past traumas in order to heal them, is necessary for effective instrumental and epistemic rationality.
Against “System 1” and “System 2″. The terms System 1 and System 2 were originally coined by the psychologist Keith Stanovich and then popularized by Daniel Kahneman in his book Thinking, Fast and Slow. Stanovich noted that a number of fields within psychology had been developing various kinds of theories distinguishing between fast/intuitive on the one hand and slow/deliberative thinking on the other. Often these fields were not aware of each other. The S1/S2 model was offered as a general version of these specific theories, highlighting features of the two modes of thought that tended to appear in all the theories.
Since then, academics have continued to discuss the models. Among other developments, Stanovich and other authors have discontinued the use of the System 1/System 2 terminology as misleading, choosing to instead talk about Type 1 and Type 2 processing. In this post, I will build on some of that discussion to argue that Type 2 processing is a particular way of chaining together the outputs of various subagents using working memory. Some of the processes involved in this chaining are themselves implemented by particular kinds of subagents.
Book summary: Unlocking the Emotional Brain. Written by the psychotherapists Bruce Ecker, Robin Ticic and Laurel Hulley, Unlocking the Emotional Brain claims to offer a neuroscience-grounded, comprehensive model of how effective therapy works. In so doing, it also happens to formulate its theory in terms of belief updating, helping explain how the brain models the world and what kinds of techniques allow us to actually change our minds. Its discussion and models are closely connected to the models about internal conflict and belief revision that are discussed in previous posts, particularly “integrating disagreeing subagents”.
A mechanistic model of meditation. Meditation has been claimed to have all kinds of transformative effects on the psyche, such as improving concentration ability, healing trauma, cleaning up delusions, allowing one to track their subconscious strategies, and making one’s nervous system more efficient. However, an explanation for why and how exactly this would happen has typically been lacking. This makes people reasonably skeptical of such claims.
In this post, I want to offer an explanation for one kind of a mechanism: meditation increasing the degree of a person’s introspective awareness, and thus leading to increasing psychological unity as internal conflicts are detected and resolved.
A non-mystical explanation of insight meditation and the three characteristics of existence: introduction and preamble. Insight meditation, enlightenment, what’s that all about?
The sequence of posts starting from this one is my personal attempt at answering that question. It seeks to:
Explain what kinds of implicit assumptions build up our default understanding of reality and how those assumptions are subtly flawed.
Point out aspects from our experience whose repeated observation will update those assumptions, and explain how this may cause psychological change in someone who meditates.
Explain how the so-called “three characteristics of existence” of Buddhism—impermanence, no-self and unsatisfactoriness—are all interrelated and connected with each other in a way that is connected to the previously discussed topics in the sequence.
Farther out (sketched out but not as extensively planned/written yet)
The game theory of rationality and cooperation in a multiagent world. Multi-agent models have a natural connection to Elephant in the Brain -style dynamics: our brains doing things for purposes of which we are unaware. Furthermore, there can be strong incentives to continue systematic self-deception and not integrate conflicting beliefs. For instance, if a mind has subagents which think that specific beliefs are dangerous to hold or express, then they will work to suppress subagents holding that belief from coming into conscious awareness.
“Dangerous beliefs” might be ones that touch upon political topics, but they might also be ones of a more personal nature. For instance, someone may have an identity as being “good at X”, and then want to rationalize away any contradictory evidence—including evidence suggesting that they were wrong on a topic related to X. Or it might be something even more subtle.
These are a few examples of how rationality work has to happen on two levels at once: to debug some beliefs (individual level), people need to be in a community where holding various kinds of beliefs is actually safe (social level). But in order for the community to be safe for holding those beliefs (social level), people within the community also need to work on themselves so as to deal with their own subagents that would cause them to attack people with the “wrong” beliefs (individual level). This kind of work also seems to be necessary for fixing “politics being the mind-killer” and collaborating on issues such as existential risk across sharp value differences; but the need to carry out the work on many levels at once makes it challenging, especially since the current environment incentivizes many (sub)agents to sabotage any attempt at this.
(This topic area is also related to that stuff Valentine has been saying about Omega.)
This sequence is part of research done for, and supported by, the Foundational Research Institute.
- How to Ignore Your Emotions (while also thinking you’re awesome at emotions) by 31 Jul 2019 13:34 UTC; 359 points) (
- Book Summary: Consciousness and the Brain by 16 Jan 2019 14:43 UTC; 170 points) (
- Toward A Bayesian Theory Of Willpower by 26 Mar 2021 2:33 UTC; 103 points) (
- 2019 Review: Voting Results! by 1 Feb 2021 3:10 UTC; 99 points) (
- Two Explorations by 16 Dec 2020 21:27 UTC; 63 points) (
- AI Alignment Problem: “Human Values” don’t Actually Exist by 22 Apr 2019 9:23 UTC; 45 points) (
- Hierarchical system preferences and subagent preferences by 11 Jan 2019 18:47 UTC; 21 points) (
- Privacy and writing by 6 Apr 2024 8:20 UTC; 20 points) (
- Synthesis of subagents: exercise by 20 Sep 2019 17:24 UTC; 10 points) (
- 17 Dec 2020 12:43 UTC; 8 points) 's comment on The LessWrong 2019 Review by (
- What alignment-related concepts should be better known in the broader ML community? by 9 Dec 2021 20:44 UTC; 6 points) (
- 31 Jul 2019 19:53 UTC; 6 points) 's comment on How to Ignore Your Emotions (while also thinking you’re awesome at emotions) by (
- 9 Jan 2021 15:02 UTC; 5 points) 's comment on How to Ignore Your Emotions (while also thinking you’re awesome at emotions) by (
- 30 Dec 2020 3:51 UTC; 4 points) 's comment on Review Voting Thread by (
- 7 Apr 2020 19:00 UTC; 2 points) 's comment on Core Tag Examples [temporary] by (
Vaniver has said most of the things I want to say here, but there are some additional things I want to say:
I think building models of the mind is really hard. I also notice that in myself, building models of the mind feels scary in a way that I often prevents me from thinking sanely in many important situations.
I think the causes of why it feels scary are varied and complicated, but a lot of it boils down to the fact that in order to model minds, a purely physically reductionistic approach is often difficult, and my standards for evidence often feel calibrated for domains like physics, other hard sciences, and mathematics, and it’s often hard to communicate my reasons for why I believe minds work a certain way to others, since a substantial portion of it is internal and difficult to communicate.
But, building explicit and broad models of our mind like this sequence does strikes me as essential being effective in the world.
Overall, I think this sequence had a positive effect on me for two reasons:
It provided me with a set of concrete models of the mind that I have used a few times since then
It rekindled a certain courage in me to allow myself to build these kind of models in the first place, and I hope it has done the same for others.
I think for me at least the second effect was larger than the first one, though both are pretty substantial.
Yeah, that used to bother me too, when I learned about multi agent theory and pondering it, I of course pointed my attention inwardly, trying to observe it.
Then agents arose and started talking with each other, arguing about the fact that they can’t tell if they’re actually representatives of underlying structures and coalitions of the neural substrate or just one fanciful part, that’s engaged in puppet phantasy play. Or what the boundaries between those two even are.
Or if their apparent existence is valid evidence for multi-agent theories being any good. Well, I suppose I wasn’t bothered, they were bothered :) I/They just really badly wanted a real-time brain scan to get context for my perceptions.
Eventually, I embraced the triplethink of operational certainty [minimizes internal conflict, preserves scarce neurotransmitters], meta doubt, and meta-meta awareness, that propositions that can be expressed in conscious language can’t capture the complexity of the neural substrate, anyway.
All models are wrong, yet modeling is essential.
While this sequence ended up spanning more than 2019, I think this represents some of the best ‘psychology’ on LW in 2019, and have some hope (like Hazard) that all of it will get represented or collected in some way.
Writing a pitch for the sequence feels like writing a pitch for writing about psychology on LW in general, as the sequence itself has it all: book reviews, highly upvoted posts, clear explanations of detailed models, commentary from other experts in the field. So why care about psychology on LW? Both because 1) it’s often a source of rapid advances in personal effectiveness, 2) that these sorts of problems are often ‘adaptive’ make them difficult to solve and thus fosters learned helplessness or unproductive thrashing, and 3) taking a systematic, rational view helps separate out the wheat from the chaff (when it comes to the advice and models) and also helps make the ‘squishy’ sort of self-development accessible to those suspicious of plans and models with poor justification.
I do think, at least for the IFS material, that it’d be useful to pull in at least this comment, and possibly more of the discussion with pjeby more broadly.