Safety regulators: A tool for mitigating technological risk
Crossposted to the Effective Altruism Forum
So far the idea of differential technological development has been discussed in a way that either (1) emphasizes ratios of progress rates, (2) ratios of remaining work, (3) maximizing or minimizing correlations (for example, minimizing the overlap between the capability to do harm and the desire to do so), (4) implementing safe tech before developing and implementing unsafe tech, and (5) the occasional niche analysis (possibly see also a complementary aside relating differential outcomes to growth rates in the long run). I haven’t seen much work talking about how various capabilities (a generalization of technology) may interact with each other in general in ways that prevent downside effects (though see also The Vulnerable World Hypothesis), and I wish to elaborate on this interaction type.
As technology improves, our capacity to do both harm and good increases and each additional capacity unlocks new capacities that can be implemented. For example the invention of engines unlocked railroads, which in turn unlocked more efficient trade networks. However, the invention of engines also enabled the construction of mobile war vehicles. How, in an ideal world, could we implement capacities so we get the outcomes we want while creating minimal harm and risks in the process?
What does implementing a capacity do? It enables us to change something. A normal progression is:
We have no control over something (e.g. We cannot generate electricity)
We have control but our choices are noisy and partially random (e.g. We can produce electric sparks on occasion but don’t know how to use them)
Our choices are organized but there are still downside effects (e.g. We can channel electricity to our homes but occasionally people get electrocuted or fires are started)
Our use of the technology mostly doesn’t have downside effects (e.g. We have capable safety regulators (e.g. insulation, fuses,...) that allows us to minimize fire and electrocution risks)
The problem is that downside effects in stages 2 and 3 could overwhelm the value achieved during those stages and at stage 4, especially when considering powerful game changing technologies that could lead to existential risks.
Even more fundamentally, as agents in the world we want to avoid shifting the expected utility in a negative direction relative to other options (the opportunity costs). We want to implement new capacities in the best sequence, like with any other plan, so as to maximize the value we achieve. The value is a property of an entire plan and the value is harder to think about than just what is the optimal (or safe) next thing to do (ignoring what is done after). We wish to make choosing which capacities to develop more manageable and easier to think about. One way to do this is to make sure that each capacity we implement is immediately an improvement relative to the state we’re in before implementing it (this simplification is an example of a greedy algorithm heuristic). What does this simplification imply about the sequence of implementing capacities?
This implies that what we want to do is to have the capacities so we may do good without the downside effects and risks of those capacities. How do we do this? If we’re lucky the capacity itself has no downside risks, and we’re done. But if we’re not lucky we need to implement a regulator on that capacity: a safety regulator. Let’s define a safety regulator as a capacity that helps control other capacities to mitigate their downside effects. Once a capacity has been fully safety regulated, it is then unlocked and we can implement it to positive effect.
Some distinctions we want to pay attention to are then:
A capacity—a technology, resource, or plan that changes the world either autonomously or by enabling us to use it
An implemented capacity—a capacity that is implemented
An available capacity—a capacity that can be implemented immediately
An unlocked capacity—a capacity that is safe and beneficial to implement given the technological context, and is also available
A potential capacity—the set of all possible capacities: those already implemented, those being worked on, those that are available and those that exists in theory but need prerequisite capacities to be implemented first.
A safety regulator—a capacity that unlocks other capacities, by mitigating downside effects and possibly providing a prerequisite. (The safety regulator may or may not be unlocked itself at this stage—you may need to implement other safety regulators or capacities to unlock it). Generally, safety regulators are somewhat specialized for the specific capacities they unlock.
Running the suggested heuristic strategy then looks like: If a capacity is unlocked, then implement it; otherwise, implement either an unlocked safety regulator for it first or choose a different capacity to implement. We could call this a safety regulated capacity expanding feedback loop. For instance, with respect to nuclear reactions humanity (1) had the implemented capacity of access to radioactivity, (2) this made available the safety regulator of controlling chain reactions, (3) determining how to control chain reactions was implemented (through experimentation and calculation), (4) this unlocked the capacity to use chain reactions (in a controlled fashion), (5) and the capacity of using chain reactions was implemented.
Limitations and extensions to this method:
It’s difficult to tell which of the unlocked capacities to implement at a particular step. But we’ll assume some sort of decision process exists for optimizing that.
Capacities may be good temporarily, but if other capacities are not implemented in time, they may become harmful (see the loss unstable states idea).
Implementing capacities in this way isn’t necessarily optimal because this approach does not allow for temporary bad effects that yield better results in the long run.
Capacities do not necessarily stay unlocked forever due to interactions with other capacities that may be implemented in the interim.
A locked capacity may be net good to implement if a safety regulator is implemented before the downside effects could take place (this is related to handling cluelessness).
The detailed interaction between capacities and planning which to develop in which order resembles the type of problem the TWEAK planner was built for and it may be one good starting point for further research.
In more detail, how can one capacity prevent the negative effects of another?
- 2020 AI Alignment Literature Review and Charity Comparison by 21 Dec 2020 15:25 UTC; 155 points) (EA Forum;
- 2020 AI Alignment Literature Review and Charity Comparison by 21 Dec 2020 15:27 UTC; 137 points) (
- Differential progress / intellectual progress / technological development by 24 Apr 2020 14:08 UTC; 47 points) (EA Forum;
- AI alignment concepts: philosophical breakers, stoppers, and distorters by 24 Jan 2020 19:23 UTC; 20 points) (
- Safety regulators: A tool for mitigating technological risk by 21 Jan 2020 13:09 UTC; 10 points) (EA Forum;
- 19 Feb 2020 8:51 UTC; 1 point) 's comment on Differential progress / intellectual progress / technological development by (EA Forum;
How do you deal with the knowledge problem? Typically, the actual, experienced pain in steps 2 and 3 is critical to the safety measures implemented in 3 and enjoyed in 4. The progress is not delayed for all possible problems, but the worst of them get addressed—the incentive to be safe (reduce pain) aligns with the incentive to use the technology at all.
This works for pain (risk that’s short-term enough to measure the cost and incidence of). It’s not clear that it works for rarer but more severe risks (x-risk or just giant economic risk).
In other words, the regulators are part of the technology in the first place—what’s the guarantee (or even the mechanism to start) that the regulators are addressing only the critical risks?
How do you see the safety regulator model working in a case like bridges, where safety is already part of the primary function of the system, i.e. a bridge is built to optimize for getting people across a gap they couldn’t otherwise cross, and being better at being a bridge (getting more people across), means being safer (fewer people fail to make it across for deadly reasons)? It’s not entirely clear where we might draw the line to demarcate a safety regulator in such cases where safety is naturally part of the function.
What is the subject of ‘wish’?
Edited to add “I” immediately in front of “wish”.