Safety regulators: A tool for mitigating technological risk

Crossposted to the Effective Altruism Forum

So far the idea of differential technological development has been discussed in a way that either (1) emphasizes ratios of progress rates, (2) ratios of remaining work, (3) maximizing or minimizing correlations (for example, minimizing the overlap between the capability to do harm and the desire to do so), (4) implementing safe tech before developing and implementing unsafe tech, and (5) the occasional niche analysis (possibly see also a complementary aside relating differential outcomes to growth rates in the long run). I haven’t seen much work talking about how various capabilities (a generalization of technology) may interact with each other in general in ways that prevent downside effects (though see also The Vulnerable World Hypothesis), and I wish to elaborate on this interaction type.

As technology improves, our capacity to do both harm and good increases and each additional capacity unlocks new capacities that can be implemented. For example the invention of engines unlocked railroads, which in turn unlocked more efficient trade networks. However, the invention of engines also enabled the construction of mobile war vehicles. How, in an ideal world, could we implement capacities so we get the outcomes we want while creating minimal harm and risks in the process?

What does implementing a capacity do? It enables us to change something. A normal progression is:

We have no control over something (e.g. We cannot generate electricity)
We have control but our choices are noisy and partially random (e.g. We can produce electric sparks on occasion but don’t know how to use them)
Our choices are organized but there are still downside effects (e.g. We can channel electricity to our homes but occasionally people get electrocuted or fires are started)
Our use of the technology mostly doesn’t have downside effects (e.g. We have capable safety regulators (e.g. insulation, fuses,...) that allows us to minimize fire and electrocution risks)

The problem is that downside effects in stages 2 and 3 could overwhelm the value achieved during those stages and at stage 4, especially when considering powerful game changing technologies that could lead to existential risks.

Even more fundamentally, as agents in the world we want to avoid shifting the expected utility in a negative direction relative to other options (the opportunity costs). We want to implement new capacities in the best sequence, like with any other plan, so as to maximize the value we achieve. The value is a property of an entire plan and the value is harder to think about than just what is the optimal (or safe) next thing to do (ignoring what is done after). We wish to make choosing which capacities to develop more manageable and easier to think about. One way to do this is to make sure that each capacity we implement is immediately an improvement relative to the state we’re in before implementing it (this simplification is an example of a greedy algorithm heuristic). What does this simplification imply about the sequence of implementing capacities?

This implies that what we want to do is to have the capacities so we may do good without the downside effects and risks of those capacities. How do we do this? If we’re lucky the capacity itself has no downside risks, and we’re done. But if we’re not lucky we need to implement a regulator on that capacity: a safety regulator. Let’s define a safety regulator as a capacity that helps control other capacities to mitigate their downside effects. Once a capacity has been fully safety regulated, it is then unlocked and we can implement it to positive effect.

Some distinctions we want to pay attention to are then:

A capacity—a technology, resource, or plan that changes the world either autonomously or by enabling us to use it
An implemented capacity—a capacity that is implemented
An available capacity—a capacity that can be implemented immediately
An unlocked capacity—a capacity that is safe and beneficial to implement given the technological context, and is also available
A potential capacity—the set of all possible capacities: those already implemented, those being worked on, those that are available and those that exists in theory but need prerequisite capacities to be implemented first.
A safety regulator—a capacity that unlocks other capacities, by mitigating downside effects and possibly providing a prerequisite. (The safety regulator may or may not be unlocked itself at this stage—you may need to implement other safety regulators or capacities to unlock it). Generally, safety regulators are somewhat specialized for the specific capacities they unlock.

Running the suggested heuristic strategy then looks like: If a capacity is unlocked, then implement it; otherwise, implement either an unlocked safety regulator for it first or choose a different capacity to implement. We could call this a safety regulated capacity expanding feedback loop. For instance, with respect to nuclear reactions humanity (1) had the implemented capacity of access to radioactivity, (2) this made available the safety regulator of controlling chain reactions, (3) determining how to control chain reactions was implemented (through experimentation and calculation), (4) this unlocked the capacity to use chain reactions (in a controlled fashion), (5) and the capacity of using chain reactions was implemented.

Limitations and extensions to this method:

It’s difficult to tell which of the unlocked capacities to implement at a particular step. But we’ll assume some sort of decision process exists for optimizing that.
Capacities may be good temporarily, but if other capacities are not implemented in time, they may become harmful (see the loss unstable states idea).
Implementing capacities in this way isn’t necessarily optimal because this approach does not allow for temporary bad effects that yield better results in the long run.
Capacities do not necessarily stay unlocked forever due to interactions with other capacities that may be implemented in the interim.
A locked capacity may be net good to implement if a safety regulator is implemented before the downside effects could take place (this is related to handling cluelessness).
The detailed interaction between capacities and planning which to develop in which order resembles the type of problem the TWEAK planner was built for and it may be one good starting point for further research.
In more detail, how can one capacity prevent the negative effects of another?