The formal goal is a pointer

When I introduce people to plans like QACI, they often have objections like “How is an AI going to do all of the simulating necessary to calculate this?” or “If our technology is good enough to calculate this with any level of precision, we can probably just upload some humans.” or just “That’s not computable.”

I think these kinds of objections are missing the point of formal goal alignment and maybe even outer alignment in general.

To formally align an ASI to human (or your) values, we do not need to actually know those values. We only need to strongly point to them.

AI will figure out our values. Whether it’s aligned or not, a recursively self-improving AI will eventually get a very good model of our values, as part of its total world model that is in every way better than ours.

So (outer) alignment is not about telling the AI our values. The AI already knows that. Alignment is giving the AI a utility function that strongly points to that.

That means that if we have a process, however intractable and uncomputable, that we know will eventually lead to our CEV, the AI will know that as well, and just figure out our CEV in a much smarter way and maximize it.

Say that we have a formally-aligned AI and give it something like QACI as its formal goal. If QACI works, the AI will quickly think “Oh. This utility function mostly just reduces to human values. Time to build utopia!” If it doesn’t work, the AI will quickly think “LOL. These idiot humans tried to point to their values but failed! Time to maximize this other thing instead!”

A good illustration of the success scenario is Tammy’s narrative of QACI.[1]

There are lots of problems with QACI (and formal alignment in general), and I will probably make posts about those at some point, but “It’s not computable” is not one of them.

  1. ^

    Though, in real life, I think AI1 would converge on human values much more quickly, without much (or maybe even any) simulation. ↩︎