My understanding is that Paul thinks breaking the evolution analogy is important, but a lot less difficult than Eliezer thinks it is
My basic take on the evolution analogy:
Evolution wasn’t trying to solve the robustness problem at all. It’s analogous to using existing ML while making zero effort to avoid catastrophic generalization failures. I’m not convinced the analogy tells us much about how hard this problem will be (rather than just showing that the problem exists). Even today, if we were trying to train an AI to care about X, we’d e.g. train on situations where X diverges from other possible goals, or where it looks like the agent isn’t being monitored as part of the training process. We’d try a variety of simple techniques to understand what the AI is thinking or anticipating, and use that information to help construct tricky situations or evaluate behavior. And so on. In the real world we are going to use much more sophisticated versions of those techniques, but the analogy doesn’t even engage with the most basic versions.
In practice I think that we can use a system nearly as smart as the AI to guide the AI’s training—before we have a super-duper-intelligent AI we have (or could choose to train) a superintelligent AI, and before that we can have a pretty intelligent AI. This is important, because the Nate/Eliezer response to the previous bullet tends to assume a huge intelligence gap between the intelligence that’s being trained and the intelligence that’s doing the overseeing. That looks like an unreasonable situation to me even if we can’t get amplification to work. (Amplification lets us have an oversight process smarter than the system we are training. But at a minimum we could get an overseer only a little bit less smart.) We’ve had a bit of argument about this, but I’ve found the argument really unconvincing and also don’t expect it to convince others.
The thing that we are doing is probably mucheasier than “evolve a species to care about precise goal X.” Training an AI to be corrigible is much closer to trying to breed a creature for docility than trying to breed it to care about some particular complex thing. I think there is a reasonable chance that this would just work even in the evolution analogy and even without any technical progress, i.e. that humans could already breed a race of docile superhumans by using the pretty basic approaches we know of now.
“Evolution wasn’t trying to solve the robustness problem at all.”—Agreed that this makes the analogy weaker. And, to state the obvious, everyone doing safety work at MIRI and OpenAI agrees that there’s some way to do neglected-by-evolution engineering work that gets you safe+useful AGI, though they disagree about the kind and amount of work.
The docility analogy seems to be closely connected to important underlying disagreements.
My basic take on the evolution analogy:
Evolution wasn’t trying to solve the robustness problem at all. It’s analogous to using existing ML while making zero effort to avoid catastrophic generalization failures. I’m not convinced the analogy tells us much about how hard this problem will be (rather than just showing that the problem exists). Even today, if we were trying to train an AI to care about X, we’d e.g. train on situations where X diverges from other possible goals, or where it looks like the agent isn’t being monitored as part of the training process. We’d try a variety of simple techniques to understand what the AI is thinking or anticipating, and use that information to help construct tricky situations or evaluate behavior. And so on. In the real world we are going to use much more sophisticated versions of those techniques, but the analogy doesn’t even engage with the most basic versions.
In practice I think that we can use a system nearly as smart as the AI to guide the AI’s training—before we have a super-duper-intelligent AI we have (or could choose to train) a superintelligent AI, and before that we can have a pretty intelligent AI. This is important, because the Nate/Eliezer response to the previous bullet tends to assume a huge intelligence gap between the intelligence that’s being trained and the intelligence that’s doing the overseeing. That looks like an unreasonable situation to me even if we can’t get amplification to work. (Amplification lets us have an oversight process smarter than the system we are training. But at a minimum we could get an overseer only a little bit less smart.) We’ve had a bit of argument about this, but I’ve found the argument really unconvincing and also don’t expect it to convince others.
The thing that we are doing is probably much easier than “evolve a species to care about precise goal X.” Training an AI to be corrigible is much closer to trying to breed a creature for docility than trying to breed it to care about some particular complex thing. I think there is a reasonable chance that this would just work even in the evolution analogy and even without any technical progress, i.e. that humans could already breed a race of docile superhumans by using the pretty basic approaches we know of now.
“Evolution wasn’t trying to solve the robustness problem at all.”—Agreed that this makes the analogy weaker. And, to state the obvious, everyone doing safety work at MIRI and OpenAI agrees that there’s some way to do neglected-by-evolution engineering work that gets you safe+useful AGI, though they disagree about the kind and amount of work.
The docility analogy seems to be closely connected to important underlying disagreements.
Conversation also continues here.