In our chat about this, Eliezer said that, aside from the difficulty of making a human-level mathematical philosophy engine aligned with our goals, an additional significant relevant disagreement with Paul is that Eliezer thinks it’s likely that we’ll use low-level self-improvement on the road to human-level AI; he used the analogies of programmers using compilers instead of writing machine code directly, and of EURISKO helping Lenat. (Again, hoping I’m not misrepresenting.)
This seems like a plausible scenario to me, but I’m not convinced it argues for the kind of work we’re doing currently; it seems fairly likely to me that “reliably helping the programmers to do low-level tasks on the way to human-level AI” can probably be handled by having a small protected metalayer, which only the humans can modify. This sort of architecture seems very problematic for an AI acting in the world—it makes Cartesian assumptions, and we don’t want something that’s bound to a single architecture like that—which is why we haven’t been looking in this direction, but if this is a significant reason for studying self-reference, we should be exploring obvious tricks involving a protected metalayer.
For example, the Milawa theorem prover has a small initial verifier, which can be replaced by a different verifier that accepts proofs in a more powerful language, given a proof that the new verifier only outputs theorems that the old verifier would also have outputted on some input. How does this avoid running into the Löbstacle? (I.e., how can the new verifier also allow the user to replace it given a proof of the analogous theorem in its language?) The answer is, by having a protected metalevel; the proof that the new verifier only outputs theorems the old verifier would have outputted does not need to prove anything about the code that is willing to switch out the new verifier for an even newer verifier.
(Thanks for the feedback!)