the core motivation for formal alignment, for me, is that a working solution is at least eventually aligned: there is an objective answer to the question “will maximizing this with arbitrary capabilities produce desirable outcomes?” where the answer does not depend, at the limit, on what does the maximization.
I don’t know about other proposals because I’m not familiar with them, but Methaethical AI actually describes the machinery of the agent, hence “the answer” does depend “on what does the maximisation”.
I don’t know about other proposals because I’m not familiar with them, but Methaethical AI actually describes the machinery of the agent, hence “the answer” does depend “on what does the maximisation”.