I strongly agree with John that “what we really want to do is to not build a thing which needs to be boxed in the first place.” This is indeed the ultimate security mindset.
I also strongly agree that relying on a “fancy,” multifaceted box that looks secure due to its complexity, but may not be (especially to a superintelligent AGI), is not security mindset.
One definition of security mindset is “suppose that anything that could go wrong, will go wrong.” So, even if we have reason to believe that we’ve achieved an aligned superintelligent AGI, we should have high-quality (not just high-quantity) security failsafes, just in case our knowledge does not generalize to the high-capabilities domain. The failsafes would help us efficiently and vigilantly test whether the AGI is indeed as aligned as we thought. This would be an example of a security mindset against overconfidence in our current assumptions.
I strongly agree with John that “what we really want to do is to not build a thing which needs to be boxed in the first place.” This is indeed the ultimate security mindset.
I also strongly agree that relying on a “fancy,” multifaceted box that looks secure due to its complexity, but may not be (especially to a superintelligent AGI), is not security mindset.
One definition of security mindset is “suppose that anything that could go wrong, will go wrong.” So, even if we have reason to believe that we’ve achieved an aligned superintelligent AGI, we should have high-quality (not just high-quantity) security failsafes, just in case our knowledge does not generalize to the high-capabilities domain. The failsafes would help us efficiently and vigilantly test whether the AGI is indeed as aligned as we thought. This would be an example of a security mindset against overconfidence in our current assumptions.