The idea is not “take an arbitrary superhuman AI and then verify it’s destined to be well behaved” but rather “develop a mathematical framework that allows you from the ground up to design a specific AI that will remain (provably) well behaved, even though you can’t, for arbitrary AIs, determine whether or not they’ll be well behaved.”
The idea is not “take an arbitrary superhuman AI and then verify it’s destined to be well behaved” but rather “develop a mathematical framework that allows you from the ground up to design a specific AI that will remain (provably) well behaved, even though you can’t, for arbitrary AIs, determine whether or not they’ll be well behaved.”