Similarly, if this were the only problem, then people would just put more effort into determining whether an AGI is aligned before turning it on, or not build them.
The traditional arguments for why AGI could go wrong imply that AGI could go wrong even if you put an immense amount of effort into trying to patch errors. In machine learning, when we validate our models, we will ideally do so in an environment that we think matches the real world, but it’s common for the real world to turn out to be subtly different. In the extreme case, you could perform comprehensive testing and verification and still fail to properly assess the real world impact.
If the cost of properly ensuring safety is arbitrarily high, there is a point at which people will begin deploying unsafe systems. This is inevitable, unless you could somehow either ban computer hardware or stop AI research insights from proliferating.
The traditional arguments for why AGI could go wrong imply that AGI could go wrong even if you put an immense amount of effort into trying to patch errors. In machine learning, when we validate our models, we will ideally do so in an environment that we think matches the real world, but it’s common for the real world to turn out to be subtly different. In the extreme case, you could perform comprehensive testing and verification and still fail to properly assess the real world impact.
If the cost of properly ensuring safety is arbitrarily high, there is a point at which people will begin deploying unsafe systems. This is inevitable, unless you could somehow either ban computer hardware or stop AI research insights from proliferating.