Slowed ASI—a possible technical strategy for alignment

Lately, much has been discussed about PauseAI, or even stopping research completely, until further progress has been made in theory or technical approaches to alignment. After thinking about this for some time, I wondered if there was a way to formalize this reasoning in mathematical terms when I stumbled upon what might be an interesting, possibly novel approach to alignment:

What if we leveraged the nature of slower computing substrates to run AI at a slower pace than current digital computers?

By “slower substrate”, I don’t mean just diminishing CPU/​GPU clock speeds, number of cores, or RAM/​VRAM. I mean choosing fundamentally slower forms of computing that would be impossible to speed up past some performance ceiling. In this way, we might be able to run and validate stronger-than-human AI in human time. Here’s an intuition pump for why it could work.

Total Intelligence and the Three Forms Of Superintelligence

In his book Superintelligence, Nick Bostrom categorizes superintelligence into three forms:

  1. Quality Superintelligence: Intelligence derived from the sophistication and effectiveness of thinking processes.

  2. Speed Superintelligence: Intelligence stemming from the rapidity of thought processes.

  3. Collective Superintelligence: Intelligence emerging from the coordinated efforts of multiple agents working in parallel.

Note that a given AI might attain superhuman performance through one, or a combination of, any of these three forms. This suggests a factored cognition in the form of a kind of Total Intelligence Equation, where Quality, Speed, and Number Of Agents in coordination are all directly proportional to Total Intelligence:

The Slower Superintelligence Mental Model

Given this relationship, it stands to reason that we might evaluate a high-quality superintelligence safely by diminishing its speed or collective capacity. In other words, if we consider Total Intelligence as a product of Quality, Speed, and Number of Agents, then a feasible approach might involve increasing the quality factor while decreasing speed and the number of agents proportionally or more. This results in a high-quality but controllable form of superintelligence.

Varying Total Intelligence by varying Number Of Agents is not a novel idea. This was the the core concept behind Paul Christiano’s work on iterated amplification (and distillation), which explored using a number of aligned agents to align a single smarter agent. However, whether quality and alignment can be cleanly factored out has been questioned by Eliezer Yudkowsky. This leaves Speed as the other variable to explore.

Intuition pump

Imagine a scenario where a plant, through incredibly slow biological processes, outputs a scientific theory comparable to Einstein’s Theory of Relativity, but over a span of 30-40 years. This is presumably 3x-4x slower than Einstein, who took about 10 years to develop his theory. Despite the slow pace, the quality of the output would be unquestionably high.

By maintaining high-quality outputs while significantly reducing speed, we enable ourselves to evaluate these outputs of a superintelligent AI at a human-comprehensible pace. This approach removes the risk of a rapid, uncontrollable intelligence explosion (or “foom”).

Where this might fit in the alignment strategy landscape

Many creative approaches to AI alignment have been proposed over the years. Some of them, like mechanistic interpretability (“mech interp”), can be considered building blocks instead of total solutions. Others, like scalable oversight, address the whole problem, but arguably side-step the crux of it, which notably, according to MIRI, is the development of a fully rigorous Agent Foundations theory.

This is probably most similar to a scalable oversight strategy. One idea is to utilize this as a sort of defense-in-depth, where the slower substrate is layered on top of other alignment strategies to decrease the probability of a sharp left turn.

Another potentially big idea would be to substitute this approach (varying intelligence speed) for the approach in Christiano’s iterated distillation and amplification proposal (varying number of agents), which would turn this into a possible full solution and would sidestep Eliezer Yudkowsky’s main gripe with IDA.

Isn’t this just a boxed AI?

No, a boxed ASI remains superintelligent and capable of outwitting its constraints (human or otherwise) to escape. A slowed AI, on the other hand, while logically advancing, operates at a pace much slower than human thought, allowing us to monitor and intervene as necessary.

Possible computing substrates

Current silicon-based computers, even old ones, often think faster than humans in specific domains (e.g., calculators). We need substrates inherently slower than human cognition. Potential candidates include:

  • DNA Computers: Utilizes biological molecules, specifically DNA, to perform computations through biochemical reactions.

  • Chemical Computers (like Belousov-Zhabotinsky (BZ) Reaction Computers): The BZ reaction is a classic example of a non-equilibrium thermodynamic process that exhibits oscillating chemical reactions. This system can be used to perform logical operations by interpreting the oscillations as computational signals.

  • Enzyme-based Computers: These systems use enzymes to catalyze reactions that can represent and process information. The reaction rates and outcomes can be controlled to perform computations.

  • Membrane Computers (P Systems): This paradigm uses biological membranes to compartmentalize chemical reactions. The transport of molecules across membranes can simulate computational steps.

But how do we build it?

Training an AI on a slow substrate might present unique challenges. Initially, we might need to bootstrap the training process on modern, fast hardware before transferring the developed AI to a slower substrate to finalize its development. This phased approach could ensure efficient and effective training while mitigating the risks associated with training and evaluating powerful models.

tl;dr

The main idea is that by adopting slower computing substrates for AI, we could create a controllable environment to develop and evaluate superintelligence.

I wrote this post in order to wonder-out-loud; my hopes are that somebody has thought of this and can point me to rebuttals or previous explorations on this topic. Perhaps I’ve missed something obvious. Let me hear it.