If we can do this, then it would give us a possible route to a controlled intelligence explosion, in which the AI designs a more capable successor AI because that is the task it has been assigned, rather than for instrumental reasons, and humans can inspect the result and decide whether or not to run it.
How would humans decide whether something designed by a superintelligent AI is safe to run? It doesn’t sound safe by design because even if we rule out safety-compromising divergence in the toy intelligence explosion, how would we know that successor AI is safe for the real world—nobody has designed it for that. We certainly shouldn’t think that we can catch potential problems with the design by our own inspection—we can’t even do it reliably for designs produced by non-superintelligent software developers.
This reminds me of problems with boxed AIs: all is good unless they can affect the real world but this is a limitation on their usefulness and if they are superintelligent we might not see them leaking out of the box.
How would humans decide whether something designed by a superintelligent AI is safe to run? It doesn’t sound safe by design because even if we rule out safety-compromising divergence in the toy intelligence explosion, how would we know that successor AI is safe for the real world—nobody has designed it for that. We certainly shouldn’t think that we can catch potential problems with the design by our own inspection—we can’t even do it reliably for designs produced by non-superintelligent software developers.
This reminds me of problems with boxed AIs: all is good unless they can affect the real world but this is a limitation on their usefulness and if they are superintelligent we might not see them leaking out of the box.