In our chat about this, Eliezer said that, aside from the difficulty of making a human-level mathematical philosophy engine aligned with our goals, an additional significant relevant disagreement with Paul is that Eliezer thinks it’s likely that we’ll use low-level self-improvement on the road to human-level AI; he used the analogies of programmers using compilers instead of writing machine code directly, and of EURISKO helping Lenat. (Again, hoping I’m not misrepresenting.)
This seems like a plausible scenario to me, but I’m not convinced it argues for the kind of work we’re doing currently; it seems fairly likely to me that “reliably helping the programmers to do low-level tasks on the way to human-level AI” can probably be handled by having a small protected metalayer, which only the humans can modify. This sort of architecture seems very problematic for an AI acting in the world—it makes Cartesian assumptions, and we don’t want something that’s bound to a single architecture like that—which is why we haven’t been looking in this direction, but if this is a significant reason for studying self-reference, we should be exploring obvious tricks involving a protected metalayer.
For example, the Milawa theorem prover has a small initial verifier, which can be replaced by a different verifier that accepts proofs in a more powerful language, given a proof that the new verifier only outputs theorems that the old verifier would also have outputted on some input. How does this avoid running into the Löbstacle? (I.e., how can the new verifier also allow the user to replace it given a proof of the analogous theorem in its language?) The answer is, by having a protected metalevel; the proof that the new verifier only outputs theorems the old verifier would have outputted does not need to prove anything about the code that is willing to switch out the new verifier for an even newer verifier.
I agree that many parts of AI research are already automated, and by the time we have broadly human-level AI, many more will be. I would not be surprised if the great majority of tasks in current AI research are automated long before we have broadly human-level AI.
I wouldn’t normally describe this as “low-level self-improvement,” but that seems like a semantic issue.
I am skeptical of the Eurisko example (and of claims about Eurisko in general) but I don’t know if it’s relevant to this disagreement.
In our chat about this, Eliezer said that, aside from the difficulty of making a human-level mathematical philosophy engine aligned with our goals, an additional significant relevant disagreement with Paul is that Eliezer thinks it’s likely that we’ll use low-level self-improvement on the road to human-level AI; he used the analogies of programmers using compilers instead of writing machine code directly, and of EURISKO helping Lenat. (Again, hoping I’m not misrepresenting.)
This seems like a plausible scenario to me, but I’m not convinced it argues for the kind of work we’re doing currently; it seems fairly likely to me that “reliably helping the programmers to do low-level tasks on the way to human-level AI” can probably be handled by having a small protected metalayer, which only the humans can modify. This sort of architecture seems very problematic for an AI acting in the world—it makes Cartesian assumptions, and we don’t want something that’s bound to a single architecture like that—which is why we haven’t been looking in this direction, but if this is a significant reason for studying self-reference, we should be exploring obvious tricks involving a protected metalayer.
For example, the Milawa theorem prover has a small initial verifier, which can be replaced by a different verifier that accepts proofs in a more powerful language, given a proof that the new verifier only outputs theorems that the old verifier would also have outputted on some input. How does this avoid running into the Löbstacle? (I.e., how can the new verifier also allow the user to replace it given a proof of the analogous theorem in its language?) The answer is, by having a protected metalevel; the proof that the new verifier only outputs theorems the old verifier would have outputted does not need to prove anything about the code that is willing to switch out the new verifier for an even newer verifier.
I agree that many parts of AI research are already automated, and by the time we have broadly human-level AI, many more will be. I would not be surprised if the great majority of tasks in current AI research are automated long before we have broadly human-level AI.
I wouldn’t normally describe this as “low-level self-improvement,” but that seems like a semantic issue.
I am skeptical of the Eurisko example (and of claims about Eurisko in general) but I don’t know if it’s relevant to this disagreement.