I’d like to offer an alternative to the third point. Let’s assume we have built a highly capable AI that we don’t yet trust. We’ve also managed to coordinate as a society and implement defensive mechanisms to get to that point. I think that we don’t have to test the AI in a low-stakes environment and then immediately move to a high-stakes one (as described in the dictator analogy), while still getting high gains.
It is feasible to design a sandboxed environment formally proven to be secure, in the sense that you can not hack into, escape from or deliberately let out of the environment (which, in particular precludes AI box experiment scenarios). This is even easier for AI systems, which typically involve a very narrow set of operations and interfaces (essentially basic arithmetic, and very constrained input and output channels).
In this scenario, the AI could still offer significant benefits. For example, it could provide formally verified (hence, safe) proofs for general math or for correctness of software (including novel AI system designs which are proven to be aligned according to some [by then] formally defined notion of alignment), or generally assist with research (e.g., with having limited output size, to allow for human comprehension). I am sure we can come up with many more example where a highly-constrained highly-capable cognitive system can still be extremely beneficial and not as dangerous.
(To be clear, I am not claiming that this approach is easy to achieve or the most likely path forward. However, it is an option that humanity could coordinate on.)
I’d like to offer an alternative to the third point. Let’s assume we have built a highly capable AI that we don’t yet trust. We’ve also managed to coordinate as a society and implement defensive mechanisms to get to that point. I think that we don’t have to test the AI in a low-stakes environment and then immediately move to a high-stakes one (as described in the dictator analogy), while still getting high gains.
It is feasible to design a sandboxed environment formally proven to be secure, in the sense that you can not hack into, escape from or deliberately let out of the environment (which, in particular precludes AI box experiment scenarios). This is even easier for AI systems, which typically involve a very narrow set of operations and interfaces (essentially basic arithmetic, and very constrained input and output channels).
In this scenario, the AI could still offer significant benefits. For example, it could provide formally verified (hence, safe) proofs for general math or for correctness of software (including novel AI system designs which are proven to be aligned according to some [by then] formally defined notion of alignment), or generally assist with research (e.g., with having limited output size, to allow for human comprehension). I am sure we can come up with many more example where a highly-constrained highly-capable cognitive system can still be extremely beneficial and not as dangerous.
(To be clear, I am not claiming that this approach is easy to achieve or the most likely path forward. However, it is an option that humanity could coordinate on.)