The main way I could see agent foundations research as helping to address these problems, rather than merely deferring them, is if we plan to eschew large-scale ML altogether.
As I understand it, the default Nate prediction is that if we get aligned AGI at all, it’s mostly likely to have a mix of garden-variety narrow-AI ML with things that don’t look like contemporary ML. I wouldn’t describe that as “eschewing large-scale ML altogether”, but possibly Paul would.
I think the more important disagreement here isn’t about how hard it is to use AF to resolve the central difficulties, but rather about how hard it is to resolve the central difficulties with the circa-2018 ML toolbox. Eliezer’s view, from the Sam Harris interview, is:
The depth of the iceberg is: “How do you actually get a sufficiently advanced AI to do anything at all?” Our current methods for getting AIs to do anything at all do not seem to me to scale to general intelligence. If you look at humans, for example: if you were to analogize natural selection to gradient descent, the current big-deal machine learning training technique, then the loss function used to guide that gradient descent is “inclusive genetic fitness”—spread as many copies of your genes as possible. We have no explicit goal for this. In general, when you take something like gradient descent or natural selection and take a big complicated system like a human or a sufficiently complicated neural net architecture, and optimize it so hard for doing X that it turns into a general intelligence that does X, this general intelligence has no explicit goal of doing X.
We have no explicit goal of doing fitness maximization. We have hundreds of different little goals. None of them are the thing that natural selection was hill-climbing us to do. I think that the same basic thing holds true of any way of producing general intelligence that looks like anything we’re currently doing in AI.
If you get it to play Go, it will play Go; but AlphaZero is not reflecting on itself, it’s not learning things, it doesn’t have a general model of the world, it’s not operating in new contexts and making new contexts for itself to be in. It’s not smarter than the people optimizing it, or smarter than the internal processes optimizing it. Our current methods of alignment do not scale, and I think that all of the actual technical difficulty that is actually going to shoot down these projects and actually kill us is contained in getting the whole thing to work at all. Even if all you are trying to do is end up with two identical strawberries on a plate without destroying the universe, I think that’s already 90% of the work, if not 99%.
My understanding is that Paul thinks breaking the evolution analogy is important, but a lot less difficult than Eliezer thinks it is.
My understanding is that Paul thinks breaking the evolution analogy is important, but a lot less difficult than Eliezer thinks it is
My basic take on the evolution analogy:
Evolution wasn’t trying to solve the robustness problem at all. It’s analogous to using existing ML while making zero effort to avoid catastrophic generalization failures. I’m not convinced the analogy tells us much about how hard this problem will be (rather than just showing that the problem exists). Even today, if we were trying to train an AI to care about X, we’d e.g. train on situations where X diverges from other possible goals, or where it looks like the agent isn’t being monitored as part of the training process. We’d try a variety of simple techniques to understand what the AI is thinking or anticipating, and use that information to help construct tricky situations or evaluate behavior. And so on. In the real world we are going to use much more sophisticated versions of those techniques, but the analogy doesn’t even engage with the most basic versions.
In practice I think that we can use a system nearly as smart as the AI to guide the AI’s training—before we have a super-duper-intelligent AI we have (or could choose to train) a superintelligent AI, and before that we can have a pretty intelligent AI. This is important, because the Nate/Eliezer response to the previous bullet tends to assume a huge intelligence gap between the intelligence that’s being trained and the intelligence that’s doing the overseeing. That looks like an unreasonable situation to me even if we can’t get amplification to work. (Amplification lets us have an oversight process smarter than the system we are training. But at a minimum we could get an overseer only a little bit less smart.) We’ve had a bit of argument about this, but I’ve found the argument really unconvincing and also don’t expect it to convince others.
The thing that we are doing is probably mucheasier than “evolve a species to care about precise goal X.” Training an AI to be corrigible is much closer to trying to breed a creature for docility than trying to breed it to care about some particular complex thing. I think there is a reasonable chance that this would just work even in the evolution analogy and even without any technical progress, i.e. that humans could already breed a race of docile superhumans by using the pretty basic approaches we know of now.
“Evolution wasn’t trying to solve the robustness problem at all.”—Agreed that this makes the analogy weaker. And, to state the obvious, everyone doing safety work at MIRI and OpenAI agrees that there’s some way to do neglected-by-evolution engineering work that gets you safe+useful AGI, though they disagree about the kind and amount of work.
The docility analogy seems to be closely connected to important underlying disagreements.
As I understand it, the default Nate prediction is that if we get aligned AGI at all, it’s mostly likely to have a mix of garden-variety narrow-AI ML with things that don’t look like contemporary ML. I wouldn’t describe that as “eschewing large-scale ML altogether”, but possibly Paul would.
I think the more important disagreement here isn’t about how hard it is to use AF to resolve the central difficulties, but rather about how hard it is to resolve the central difficulties with the circa-2018 ML toolbox. Eliezer’s view, from the Sam Harris interview, is:
My understanding is that Paul thinks breaking the evolution analogy is important, but a lot less difficult than Eliezer thinks it is.
My basic take on the evolution analogy:
Evolution wasn’t trying to solve the robustness problem at all. It’s analogous to using existing ML while making zero effort to avoid catastrophic generalization failures. I’m not convinced the analogy tells us much about how hard this problem will be (rather than just showing that the problem exists). Even today, if we were trying to train an AI to care about X, we’d e.g. train on situations where X diverges from other possible goals, or where it looks like the agent isn’t being monitored as part of the training process. We’d try a variety of simple techniques to understand what the AI is thinking or anticipating, and use that information to help construct tricky situations or evaluate behavior. And so on. In the real world we are going to use much more sophisticated versions of those techniques, but the analogy doesn’t even engage with the most basic versions.
In practice I think that we can use a system nearly as smart as the AI to guide the AI’s training—before we have a super-duper-intelligent AI we have (or could choose to train) a superintelligent AI, and before that we can have a pretty intelligent AI. This is important, because the Nate/Eliezer response to the previous bullet tends to assume a huge intelligence gap between the intelligence that’s being trained and the intelligence that’s doing the overseeing. That looks like an unreasonable situation to me even if we can’t get amplification to work. (Amplification lets us have an oversight process smarter than the system we are training. But at a minimum we could get an overseer only a little bit less smart.) We’ve had a bit of argument about this, but I’ve found the argument really unconvincing and also don’t expect it to convince others.
The thing that we are doing is probably much easier than “evolve a species to care about precise goal X.” Training an AI to be corrigible is much closer to trying to breed a creature for docility than trying to breed it to care about some particular complex thing. I think there is a reasonable chance that this would just work even in the evolution analogy and even without any technical progress, i.e. that humans could already breed a race of docile superhumans by using the pretty basic approaches we know of now.
“Evolution wasn’t trying to solve the robustness problem at all.”—Agreed that this makes the analogy weaker. And, to state the obvious, everyone doing safety work at MIRI and OpenAI agrees that there’s some way to do neglected-by-evolution engineering work that gets you safe+useful AGI, though they disagree about the kind and amount of work.
The docility analogy seems to be closely connected to important underlying disagreements.
Conversation also continues here.
What’s “AF” here?
I think Agent Foundations