Not sure what you mean by this or by “more object-level solution to alignment”. Please explain more?
The proposed setup can be seen as a self-improving AI, but a pretty opaque one. To explain why it makes a particular decision, we must appeal to anthropomorphism, like “our team of researchers wouldn’t do such a stupid thing”. That seems prone to wishful thinking. I would prefer to launch an AI for which at least some decisions have non-anthropomorphic explanations.
The proposed setup can be seen as a self-improving AI, but a pretty opaque one. To explain why it makes a particular decision, we must appeal to anthropomorphism, like “our team of researchers wouldn’t do such a stupid thing”. That seems prone to wishful thinking. I would prefer to launch an AI for which at least some decisions have non-anthropomorphic explanations.