the core motivation for formal alignment, for me, is that a working solution is at least eventually aligned: there is an objective answer to the question “will maximizing this with arbitrary capabilities produce desirable outcomes?” where the answer does not depend, at the limit, on what does the maximization.
I don’t know about other proposals because I’m not familiar with them, but Methaethical AI actually describes the machinery of the agent, hence “the answer” does depend “on what does the maximisation”.
There is also davidad’s Open Agency Architecture
https://www.alignmentforum.org/posts/pKSmEkSQJsCSTK6nH/an-open-agency-architecture-for-safe-transformative-ai
Nice to see someone who wants to directly tackle the big problem. Also nice to see someone who appreciates June Ku’s work.
I don’t know about other proposals because I’m not familiar with them, but Methaethical AI actually describes the machinery of the agent, hence “the answer” does depend “on what does the maximisation”.