Being a Robust Agent
Second version, updated for the 2018 Review. See change notes.
There’s a concept which many LessWrong essays have pointed at it (indeed, I think the entire sequences are exploring). But I don’t think there’s a single post really spelling it out explicitly:
You might want to become a more robust, coherent agent.
By default, humans are a kludgy bundle of impulses. But we have the ability to reflect upon our decision making, and the implications thereof, and derive better overall policies.
Some people find this naturally motivating –it’s aesthetically appealing to be a coherent agent. But if you don’t find naturally appealing, the reason I think it’s worth considering is robustness – being able to succeed at novel challenges in complex domains.
This is related to being instrumentally rational, but I don’t think they’re identical. If your goals are simple and well-understood, and you’re interfacing in a social domain with clear rules, and/or you’re operating in domains that the ancestral environment would have reasonably prepared you for… the most instrumentally rational thing might be to just follow your instincts or common folk-wisdom.
But instinct and common wisdom often aren’t enough, such as when...
You expect your environment to change, and default-strategies to stop working.
You are attempting complicated plans for which there is no common wisdom, or where you will run into many edge-cases.
You need to coordinate with other agents in ways that don’t have existing, reliable coordination mechanisms.
You expect instincts or common wisdom to be wrong in particular ways.
You are trying to outperform common wisdom. (i.e. you’re a maximizer instead of a satisficer, or are in competition with other people following common wisdom)
In those cases, you may need to develop strategies from the ground up. Your initial attempts may actually be worse than the common wisdom. But in the longterm, if you can acquire gears-level understanding of yourself, the world and other agents, you might eventually outperform the default strategies.
Elements of Robust Agency
I think of Robust Agency as having a few components. This is not exhaustive, but an illustrative overview:
Deliberate Agency
Gears-level-understanding of yourself
Coherence and Consistency
Game Theoretic Soundness
Deliberate Agency
First, you need to decide to be any kind of deliberate agent at all. Don’t just go along with whatever kludge of behaviors that evolution and your social environment cobbled together. Instead, make conscious choices about your goals and decision procedures that you reflectively endorse,
Gears Level Understanding of Yourself
In order to reflectively endorse your goals and decisions, it helps to understand your goals and decisions, as well as intermediate parts of yourself. This requires many subskills, such as the ability to introspect, or to make changes to how your decision making works.
(Meanwhile, it also helps to understand how your decisions interface with the rest of the world, and the people you interact with. Gears level understanding is generally useful. Scientific and mathematical literacy helps you validate your understanding of the world)
Coherence and Consistency
If you want to lose weight and also eat a lot of ice cream, that’s a valid set of human desires. But, well, it might just be impossible.
If you want to make long term plans that require commitment but also want the freedom to abandon those plans whenever, you may have a hard time. People you made plans with might get annoyed.
You can make deliberate choices about how to resolve inconsistencies in your preferences. Maybe you decide “actually, losing weight isn’t that important to me”, or maybe you decide that you want to keep eating all your favorite foods but also cut back on overall calorie consumption.
The “commitment vs freedom” example gets at a deeper issue – each of those opens up a set of broader strategies, some of which are mutually exclusive. How you resolve the tradeoff will shape what future strategies are available to you.
There are benefits to reliably being able to make trades with your future-self, and with other agents. This is easier if your preferences aren’t contradictory, and easier if your preferences are either consistent over time, or at least predictable over time.
Game Theoretic Soundness
There are other agents out there. Some of them have goals orthogonal to yours. Some have common interests with you, and you may want to coordinate with them. Others may be actively harming you and you need to stop them.
They may vary in…
What their goals are.
What their beliefs and strategies are.
How much they’ve thought about their goals.
Where they draw their circles of concern.
How hard (and how skillfully) they’re trying to be game theoretically sound agents, rather than just following local incentives.
Being a robust agent means taking that into account. You must find strategies that work in a messy, mixed environment with confused allies, active adversaries, and sometimes people who are a little bit of both. (This includes creating credible incentives and punishments to deter adversaries from bothering, and motivating allies to become less confused).
Related to this is legibility. Your gears-level-model-of-yourself helps you improve your own decision making. But it also lets you clearly expose your policies to other people. This can help with trust and coordination. If you have a clear decision-making procedure that makes sense, other agents can validate it, and then you can tackle more interesting projects together.
Examples
Here’s a smattering of things I’ve found helpful to think about through this lens:
Be the sort of person that Omega can clearly tell is going to one-box – even a version of Omega who’s only 90% accurate. Or, less exotically: Be the sort of person who your social network can clearly see is worth trusting, with sensitive information, or with power. Deserve Trust.
Be the sort of agent who cooperates when it is appropriate, defects when it is appropriate, and can realize that cooperating-in-this-particular-instance might look superficially like defecting, but avoid falling into the trap.
Think about the ramifications of people who think like you adopting the same strategy. Not as a cheap rhetorical trick to get you to cooperate on every conceivable thing. Actually think about how many people are similar to you. Actually think about the tradeoffs of worrying about a given thing. (Is recycling worth it? Is cleaning up after yourself at a group house? Is helping a person worth it? The answer actually depends, don’t pretend otherwise).
If there isn’t enough incentive for others to cooperate with you, you may need to build a new coordination mechanism so that there is enough incentive. Complaining or getting angry about it might be a good enough incentive but often doesn’t work and/or isn’t quite incentivizing the thing you meant. (Be conscious of the opportunity costs of building this coordination mechanism instead of other ones. Be conscious of trying and failing to build a coordination mechanism. Mindshare is only so big)
Be the sort of agent who, if some AI engineers were whiteboarding out the agent’s decision making, they would see that the agent makes robustly good choices, such that those engineers would choose to implement that agent as software and run it.
Be cognizant of order-of-magnitude. Prioritize (both for things you want for yourself, and for large scale projects shooting for high impact).
Do all of this realistically given your bounded cognition. Don’t stress about implementing a game theoretically perfect strategy, but do be cognizant how much computing power you actually have (and periodically reflect on whether your cached strategies can be re-evaluated given new information or more time to think). If you’re being simulated on a whiteboard right now, have at least a vague, credible notion of how you’d think better if given more resources.
Do all of this realistically given the bounded condition of *others*. If you have a complex strategy that involves rewarding or punishing others in highly nuanced ways.… and they can’t figure out what your strategy is, you may instead just be adding random noise instead of a clear coordination protocol.
Why is this important?
If you are a maximizer, trying to do something hard, it’s hopefully a bit obvious why this is important. It’s hard enough to do hard things without having incoherent exploitable policies and wasted motion chasing inconsistent goals.
If you’re a satisficer, and you’re basically living your life pretty chill and not stressing too much about it, it’s less obvious that becoming a robust, coherent agent is useful. But I think you should at least consider it, because...
The world is unpredictable
The world is changing rapidly, due to cultural clashes as well as new technology. Common wisdom can’t handle the 20th century, let alone the 21st, let alone a singularity.
I feel comfortable making the claim: Your environment is almost certainly unpredictable enough that you will benefit from a coherent approach to solving novel problems. Understanding your goals and your strategy are vital.
There are two main reasons I can see to not prioritize the coherent agent strategy:
1. There may be higher near-term priorities.
You may want to build a safety net, to give yourself enough slack to freely experiment. It may make sense to first do all the obvious things to get a job, have enough money, and social support. (That is, indeed, what I did)
I’m not kidding when I say that building your decisionmaking from the ground up can leave you worse off in the short term. The valley of bad rationality be real, yo. See this post for some examples of things to watch out for.
Becoming a coherent agent is useful, but if you don’t have a general safety net, I’d prioritize that first.
2. Self-reflection and self-modification is hard.
It requires a certain amount of mental horsepower, and some personality traits that not everyone has, including:
Social resilience and openness-to-experience (necessary to try nonstandard strategies).
Something like ‘stability’ or ‘common sense’ (I’ve seen some people try to rebuild their decision theory from scratch and end up hurting themselves).
In general, the ability to think on purpose, and do things on purpose.
If you’re the sort of person who ends up reading this post, I think you are probably the sort of person who would probably benefit (someday, from a position of safety/slack) from attempting to become more coherent, robust and agentic.
I’ve spent the past few years hanging around people who more agentic than me. It took a long while to really absorb their worldview. I hope this post gives others a clearer idea of what this path might look like, so they can consider it for themselves.
Game Theory in the Rationalsphere
That said, the reason I was motivated to write this wasn’t to help individuals. It was to help with group coordination.
The EA, Rationality and X-Risk ecosystems include lots of people with ambitious, complex goals. They have many common interests and should probably be coordinating on a bunch of stuff. But they disagree on many facts, and strategies. They vary in how hard they’ve tried to become game-theoretically-sound agents.
My original motivation for writing this post was that I kept seeing (what seemed to me) to be strategic mistakes in coordination. It seemed to me that people were acting as if the social landscape was more uniform, and expecting people to be on the same “meta-page” of how to resolve coordination failure.
But then I realized that I’d been implicitly assuming something like “Hey, we’re all trying to be robust agents, right? At least kinda? Even if we have different goals and beliefs and strategies?”
And that wasn’t obviously true in the first place.
I think it’s much easier to coordinate with people if you are able to model each other. If people have common knowledge of a shared meta-strategic-framework, it’s easier to discuss strategy and negotiate. If multiple people are trying to make their decision-making robust in this way, that hopefully can constrain their expectations about when and how to trust each other.
And if you aren’t sharing a meta-strategic-framework, that’s important to know!
So the most important point of this post is to lay out the Robust Agent paradigm explicitly, with a clear term I could quickly refer to in future discussions, to check “is this something we’re on the same page about, or not?” before continuing on to discuss more complicated ideas.
- “Can you keep this confidential? How do you know?” by 21 Jul 2020 0:33 UTC; 163 points) (
- 2018 Review: Voting Results! by 24 Jan 2020 2:00 UTC; 135 points) (
- Morality as “Coordination”, vs “Do-Gooding” by 29 Dec 2020 2:37 UTC; 75 points) (
- The role of tribes in achieving lasting impact and how to create them by 29 Sep 2021 20:48 UTC; 72 points) (EA Forum;
- Robust Agency for People and Organizations by 19 Jul 2019 1:18 UTC; 65 points) (
- The virtue of determination by 10 Jul 2023 5:11 UTC; 58 points) (
- On Being Robust by 10 Jan 2020 3:51 UTC; 45 points) (
- [Review] Meta-Honesty (Ben Pace, Dec 2019) by 10 Dec 2019 0:37 UTC; 29 points) (
- The role of tribes in achieving lasting impact and how to create them by 29 Sep 2021 20:48 UTC; 14 points) (
- Antifragility in Games of Chance, Research, and Debate by 26 May 2021 17:51 UTC; 13 points) (
- 29 Jun 2020 16:25 UTC; 10 points) 's comment on A reply to Agnes Callard by (
- What specific decision, event, or action significantly improved an aspect of your life in a value-congruent way? How? by 1 Apr 2021 4:26 UTC; 9 points) (
- 19 Jul 2019 5:12 UTC; 7 points) 's comment on Robust Agency for People and Organizations by (
- Morality as “Coordination” vs “Altruism” by 29 Dec 2020 2:38 UTC; 6 points) (EA Forum;
- 8 Dec 2020 22:58 UTC; 6 points) 's comment on The Incomprehensibility Bluff by (
- How to practice rationality? by 23 Dec 2020 23:15 UTC; 5 points) (
- 23 Jan 2020 23:01 UTC; 5 points) 's comment on Concerns Surrounding CEV: A case for human friendliness first by (
- 1 Dec 2019 23:49 UTC; 4 points) 's comment on Beyond Astronomical Waste by (
- 1 Jun 2019 3:31 UTC; 4 points) 's comment on Drowning children are rare by (
- 10 Jul 2022 17:06 UTC; 3 points) 's comment on Seven ways to become unstoppably agentic by (
- 2 Dec 2019 0:37 UTC; 2 points) 's comment on Hazard’s Shortform Feed by (
- 26 Feb 2020 21:13 UTC; 2 points) 's comment on Reviewing the Review by (
- 28 Dec 2021 1:45 UTC; 2 points) 's comment on Conversational Cultures: Combat vs Nurture (V2) by (
Bumping this up to two nominations not because I think it needs a review, but because I like it and it captures an important insight that I’ve not seen written up like this elsewhere.
In my own life, these insights have led me to do/considering doing things like:
not sharing private information even with my closest friends—in order for them to know in future that I’m the kind of agent who can keep important information (notice that there is the counterincentive that, in the moment, sharing secrets makes you feel like you have a stronger bond with someone—even though in the long-run it is evidence to them that you are less trustworthy)
building robustness between past and future selves (e.g. if I was excited about and had planned for having a rest day, but then started that day by work and being really excited by work, choosing to stop work and decide to rest such that different parts of me learn that I can make and keep inter-temporal deals (even if work seems higher ev in the moment))
being more angry with friends (on the margin) -- to demonstrate that I have values and principles and will defend those in a predictable way, making it easier to coordinate with and trust me in future (and making it easier for me to trust others, knowing I’m capable of acting robustly to defend my values)
thinking about, in various domains, “What would be my limit here? What could this person do such that I would stop trusting them? What could this organisation do such that I would think their work is net negative?” and then looking back at those principles to see how things turned out
not sharing passwords with close friends, even for one-off things—not because I expect them to release or lose it, but simply because it would be a security flaw that makes them more vulnerable to anyone wanting to get to me. It’s a very unlikely scenario, but I’m choosing to adopt a robust policy across cases, and it seems like useful practice
This post has helped me understand quite a bit the mindset of a certain subset of rationalists, and being able to point to it and my disagreements with it has been quite helpful in finding cruxes with disagreements.