I think the basic argument was: Self-reference, tiling, etc. are core parts of cognition/optimization/reasoning. Current theory fails to account for those things. If humanity generally gets less confused in its AI concepts/frameworks, and gets more knowledge about core parts of cognition/optimization/reasoning, then there’s more hope that humanity will be able to (a) deliberately steer toward relatively transparent and “aimable” approaches to AGI, and (b) understand the first AGI systems well enough to align them in practice.
Understanding your system obviously helps with aligning an AGI to the task “recursively self-improve to the point where you can do CEV” and the task “do CEV”, but it also helps with aligning an AGI to the task “do a pivotal act” / “place, onto this particular plate here, two strawberries identical down to the cellular but not molecular level”.
I think the basic argument was: Self-reference, tiling, etc. are core parts of cognition/optimization/reasoning. Current theory fails to account for those things. If humanity generally gets less confused in its AI concepts/frameworks, and gets more knowledge about core parts of cognition/optimization/reasoning, then there’s more hope that humanity will be able to (a) deliberately steer toward relatively transparent and “aimable” approaches to AGI, and (b) understand the first AGI systems well enough to align them in practice.
Understanding your system obviously helps with aligning an AGI to the task “recursively self-improve to the point where you can do CEV” and the task “do CEV”, but it also helps with aligning an AGI to the task “do a pivotal act” / “place, onto this particular plate here, two strawberries identical down to the cellular but not molecular level”.