QACI and PreDCA are certainly the most galaxy-brained alignment plans I’ve seen, and for reasons I can understand: if you’re a superintelligence deciding what to do in a largely unknown situation, it makes sense to start by considering all possible worlds, or even all possible models of all possible worlds, and then narrowing your focus. I haven’t delved into every detail of their algorithms or their metaphysics, but the overall feel makes sense to me. (As for the complaint that this is all wildly uncomputable, what we are seeing here is the idealized version of the calculation. The real thing would use heuristics.)
I also think the part of the plan which involves satisfying the values of the indicated person, is extremely “under-theorized”, and that’s putting it mildly. AI_0 is going to propose a design for AI_1, and the criterion is, that the person will approve of the proposed design when they see it. I suppose such a moment is inevitable in any alignment plan—a moment when some human being or group of human beings must decide, using their natural faculties of judgment, whether superalignment has been solved and it’s now safe to set the process in motion. But in that case, we need a much better idea of what it would mean to be ready for that moment and that responsibility.
QACI and PreDCA are certainly the most galaxy-brained alignment plans I’ve seen, and for reasons I can understand: if you’re a superintelligence deciding what to do in a largely unknown situation, it makes sense to start by considering all possible worlds, or even all possible models of all possible worlds, and then narrowing your focus. I haven’t delved into every detail of their algorithms or their metaphysics, but the overall feel makes sense to me. (As for the complaint that this is all wildly uncomputable, what we are seeing here is the idealized version of the calculation. The real thing would use heuristics.)
I also think the part of the plan which involves satisfying the values of the indicated person, is extremely “under-theorized”, and that’s putting it mildly. AI_0 is going to propose a design for AI_1, and the criterion is, that the person will approve of the proposed design when they see it. I suppose such a moment is inevitable in any alignment plan—a moment when some human being or group of human beings must decide, using their natural faculties of judgment, whether superalignment has been solved and it’s now safe to set the process in motion. But in that case, we need a much better idea of what it would mean to be ready for that moment and that responsibility.