The last era of human mistakes

Link post

Suppose we had to take moves in a high-stakes chess game, with thousands of lives at stake. We wouldn’t just find a good chess player and ask them to play carefully. We would consult a computer. It would be deeply irresponsible to do otherwise. Computers are better than humans at chess, and more reliable.

We’d probably still keep some good chess players in the loop, to try to catch possible computer error. (Similarly we still have pilots for planes, even though the autopilot is often safer.) But by consulting the computer we’d remove the opportunity for humans to make a certain type of high stakes mistake.

A lot of the high stakes decisions people make today don’t look like chess, or flying a plane. They happen in domains where computers are much worse than humans.

But that’s a contingent fact about our technology level. If we had sufficiently good AI systems, they could catch and prevent significant human errors in whichever domains we wanted them to.

In such a world, I think that they would come to be employed for just about all suitable and important decisions. If some actors didn’t take advice from AI systems, I would expect them to lose power over time to actors who did. And if public institutions were making consequential decisions, I expect that it would (eventually) be seen as deeply irresponsible not to consult computers.

In this world, humans could still be responsible for taking decisions (with advice). And humans might keep closer to sole responsibility for some decisions. Perhaps deciding what, ultimately, is valued. And many less consequential decisions, but still potentially large at the scale of an individual’s life (such as who to marry, where to live, or whether to have children), might be deliberately kept under human control^[1].

Such a world might still collapse. It might face external challenges which were just too difficult. But it would not fail because of anything we would parse as foolish errors.

In many ways I’m not so interested in that era. It feels out of reach. Not that we won’t get there, but that there’s no prospect for us to help the people of that era to navigate it better.

My attention is drawn, instead, to the period before it. This is a time when AI will (I expect) be advancing rapidly. Important decisions may be made in a hurry. And while automation-of-advice will be on the up, it seems like wildly unprecedented situations will be among the hardest things to automate good advice for. We might think of it as the last era of consequential human mistakes^[2].

Can we do anything to help people navigate those? I honestly don’t know. It feels very difficult (given the difficulty at our remove in even identifying the challenges properly). But it doesn’t feel obviously impossible.

What will this era look like?

Perhaps AI progress is blisteringly fast and we move from something like the world of today straight to a world where human mistakes don’t matter. But I doubt it.

On my mainline picture of things, this era — the final one in which human incompetence (and hence human competence) really matters — might look something like this:

Cognitive labour approaching the level of human thinking in many domains is widespread, and cheap
- People are starting to build elaborate ecosystems leveraging its cheapness …
  - … since if one of the basic inputs to the economy is changed, the optimal arrangement of things is probably quite different (cf. the ecosystem of things built on the internet);
  - … but that process hasn’t reached maturity.
- There is widespread access to standard advice, which helps to avoid some foolish errors, though this is only applicable to “standard” situations, and it isn’t universal to seek that advice
In some domains, AI performance is significantly better than human performance
- This tends to be domains with good feedback loops, which are better targets for automation
- This includes some parts of research (and research is correspondingly speeding up), but not all
- This includes some power-seeking moves (but not others)
  - Humans employing AI eat up most or all of the free energy of good automated power-seeking strategy/tactics, so this doesn’t immediately create an instability where AI actors can amass large amounts of power
A lot of people’s jobs are at risk, but inertia mean in many cases the jobs persist longer than they need to
- In any case it’s not (at this point in time) a case of mass human unemployment; rather, people are moving into new opportunities:
  - Doing the most interesting parts of their jobs and using AI tools to automate a lot of the rest
  - Doing manual labour of various types, with AI providing on the job training and assistance
In dealing with importantly unprecedented situations (which includes parts of research, and choosing strategy for a changing world in a forward-looking way), AI is worse than the top humans
- It may well be better than many humans, but lack of feedback loops mean it’s hard to tell, and people’s trust falls back a good amount on their priors

That’s enough predictions that I’m probably wrong in some of the particulars. But I think the broad brush stroke picture is decently likely.

Central challenges to be borne by humans

What kind of challenges will people actually face at these times?

This is difficult to be particularly confident about. But here are some thoughts:

If the players on the gameboard thereafter will not make errors, the challenge of the time will be setting up the gameboard well, on dimensions like:
- Who the players are (their values and temperaments)
  - In addition to humans, it matters what AI systems, and what institutions, we create
- How much power, and of what sorts, the various players have
- The social equilibrium (maybe)
  - Are there e.g. prohibitions on certain types of action?
  - It’s unclear whether there’s a lot of path dependency here
- The technological position (maybe)
  - What technologies are available could determine the strategic position
  - Which research can be easily automated could determine what the future technological landscape looks like
  - This might not have so much influence if there’s some kind of grand bargain between the players
By default, I expect effective automation of good advice for power-seeking actions to come earlier than effective automation of good advice for values-shaping actions (like choosing personal values that you’d later endorse, or like working to make large institutions have particular values)
- This intuition feels vague (like it’s not grounded in a particular concrete story), so there’s definitely space for it to be wrong
  - The vibe of the intuition is like “power seeking has good feedback loops, and things with good feedback loops tend to get automated earlier”
- This could mean that there’s useful work to be done in helping prepare people to handle the value-shaping parts
- This is least true for handling high stakes unprecedented situations that have implications for distribution of power — dealing with unprecedented situations seems likely to be at the hard end of things-to-automate
Background equilibria may be changing fast, as AI disrupts many parts of society
- Cognitive resources that were previously expensive may become cheap (perhaps in worse forms)
  - cf. translation, artwork today
- The rapidity could demand accelerated processes for finding new good equilibria which stop things somehow-or-other going off the rails
- In some cases, the state of technology might differentially favour destabilizing equilibria, including perhaps on a military front
One currently-unprecedented scenario (which might in the future have good precedents) is accidentally ceding power to newly created intelligent systems
- Trying to help with this has some significant amount of attention already
  - Avoiding the accidental creation of systems with undesired values seems to more or less correspond with AI alignment
  - Avoiding accidentally ceding power to such systems seems to more or less correspond with AI control
- In both cases I think the most useful work today is about laying the groundwork for future automation of the research
  - An important component of this is conceptual research, getting clarity on what things would even be automated

Trying to help at far remove

Even if we have some sense of their challenges and desire to help — what can we do? A central difficulty is that, however much we can get a sense of their challenges, their own sense of the challenges will be much better. It is inefficient for us to focus too much on specific scenarios^[3]. A related issue is that they will have better tools than we do — some work we might want to do could by then be automated.

I don’t know how to think about this systematically, so I may well be missing things. But for now, there are three strategies which seem to me to have some promise — one about helping the future players to act wisely, and two about helping to get the gameboard in a good position.

First, deepening understanding of foundational matters. Having a good grounding in the basics (both theoretical and empirical) seems like it’s helpful for understanding all sorts of situations. We have some disadvantage from distance of not knowing which areas of foundations are most relevant, but the space of possible foundations is much much smaller than the space of possible applications, and we can make some educated guesses. In this case that means analysis of the nature of AI, of the senses in which different actors might have values, of the basic dynamics of game theory or bargaining in cases with partial information and partially defined preferences, and so forth. It seems to me like although we have models of all of these things, our models don’t always feel like they’re capturing all the important things. I wouldn’t be surprised if improvements in these foundations were possible, were helpful, and were counterfactual (through the relevant moments).

Second, power seeking on behalf of values one likes. This can include trying to shape the values of various actors, or trying to empower actors with desirable values. Honestly I’m pretty nervous about this one, because (1) it’s so common and human for people to delude themselves into thinking that their values are superior, even when they’re not, and (2) society has good memetic immune responses against various types of power seeking, so it can be easy for this to backfire. But it definitely is a strategy which can work at this distance, and it has some types of robustness (it doesn’t rely on second-guessing future actors, but is just about setting the gameboard up well). I feel relatively less worried about versions of this which are focused on fundamental values like cooperativeness and a commitment to moral reflection and truth-seeking, and more worried about versions predicated on particular object-level views about which values are correct.

Third, differential technological development. It seems quite possible that the position people are in will depend in various ways on the state of technologies. Work which facilitates desirable technologically pathways coming sooner relative to less desirable ones seems like a good lever. This can include (as e.g. in the cases of AI alignment and control) work laying the groundwork for future automation of research, including conceptual work helping to inform what things, exactly, are good to automate. Differential technological development, as well as being a strategy in its own right (aiming to positively influence the tech available during the last era of human mistakes), can also be a tactic in service of the two other strategies above — e.g. perhaps differentially advancing research which helps us to think clearly about big novel issues.

What to make of this

Framing in terms of the last era of human mistakes feels to me like it’s capturing some important dynamics (although it may be confused about others). I feel glad to have found the perspective, and to get to interrogate it. It helps to remind me how strange the future will be. And it seems like it provides some seeds which I may later find helpful for my thinking.

At the same time, as of the time of writing I’m not sure how much this perspective will help. It shifts my view of things, but it doesn’t make it very transparent what to do. Still, I felt like there was enough here to be worth sharing. If other people find the perspective useful, or not-useful, I’d be interested to hear about that.

^
Or not — there are possible futures where humans are removed from decision loops altogether.
^
I’ve sometimes heard this period, or something close to it, called “crunch time”. I mildly dislike that name because although it points to the importance of the period it sort of obscures the mechanisms via which it’s important.
^
Although it often seems to be very productive to explore specific scenarios, to help keep general thinking grounded.