(Eli’s personal notes, mostly for his own understanding. Feel free to respond if you want.)
My summary of what Eliezer is saying (in the middle part of the post):
The imitation-agents that make up an the AI must be either _very_ exact imitations (of the original agents), or not very exact imitations.
If the agents are very exact imitations, then...
1. You need an enormous amount of computational power get them to work, and
2. They must already be very superintelligent, because imitating a human exactly is a very AI complete task. If Paul’s proposal depends on exact imitation, that’s to say that it doesn’t work until we’ve reached very superintelligent capability, which seems alarming.
If the agents are not very exact imitations, then...
Either,
1. Your agents aren’t very intelligent or,
2. You run into the x-and-only-x problem and your inexact imitations don’t guaranty safety. It can imitate the human, but also be doing all kinds of things that are unsafe.
Paul seems to respond, by saying that,
1. We’re in the inexact imitation paradigm.
2. He intends to solve the x-and-only-x problem via other external checks (which, crucially, rely on having a smarter that you can trust.)
(Eli’s personal notes, mostly for his own understanding. Feel free to respond if you want.)
My summary of what Eliezer is saying (in the middle part of the post):
The imitation-agents that make up an the AI must be either _very_ exact imitations (of the original agents), or not very exact imitations.
If the agents are very exact imitations, then...
1. You need an enormous amount of computational power get them to work, and
2. They must already be very superintelligent, because imitating a human exactly is a very AI complete task. If Paul’s proposal depends on exact imitation, that’s to say that it doesn’t work until we’ve reached very superintelligent capability, which seems alarming.
If the agents are not very exact imitations, then...
Either,
1. Your agents aren’t very intelligent or,
2. You run into the x-and-only-x problem and your inexact imitations don’t guaranty safety. It can imitate the human, but also be doing all kinds of things that are unsafe.
Paul seems to respond, by saying that,
1. We’re in the inexact imitation paradigm.
2. He intends to solve the x-and-only-x problem via other external checks (which, crucially, rely on having a smarter that you can trust.)