JuliaHP comments on Orthogonal’s Formal-Goal Alignment theory of change

JuliaHP 28 Jun 2023 14:03 UTC
9 points
4
Recently we modified QACI to give a scoring over actions, instead of over worlds. This should allow weaker systems inner aligned to QACI to output weaker non-DSA actions, such as the textbook from the future, or just human readable advice on how to end the acute risk period. Stronger systems might output instructions for how to go about solving corrigible AI, or something to this effect.

As for diamonds, we believe this is actually a harder problem than alignment, and it’s a mistake to aim at it. Solving diamond-maximization requires us to point at what we mean by “maximizing diamonds” in physics in a way which is ontologically robust. QACI instead gives us an easier target; informational data blobs which causally relate to a human. The cost is that we now give up power to that human user to implement their values, but this is no issue since that what we wanted to do anyways. If the humans in the QACI interval were actually pursuing diamond-maximization, instead of some form of human values, QACI would solve diamond maximization.