I see that there’s a comment-chain under this reply but I’ll reply here to start a somewhat new line of thought. Let it be noted though that I’m pretty confident that I agree with the points that Turntrout makes. With that out of the way...
However, we then need to inspect what this means for the original argument. The Russell criticism being that it’s blindingly obvious that an apparently trivial MDP is massively risky.
In case it isn’t clear, when Russel says ” It is trivial to construct a toy MDP...”, I interpret this to mean “It is trivial to conceive of a toy MDP...” That is, he is using the word in the sense of a constructive proof; he isn’t literally implying that building an AI-risky MDP is a trivial task.
In which case I think it would be wise for someone with Russell’s views not to call the opposition stupid. Or to assert that the position is trivial.
I wouldn’t call the the opposition stupid either but I would suggest that they have not used their full imaginative capabilities to evaluate the situation. From the OP:
“Yann LeCun: [...] I think it would only be relevant in a fantasy world in which people would be smart enough to design super-intelligent machines, yet ridiculously stupid to the point of giving it moronic objectives with no safeguards.”″”
The mistake Yann LeCun is making here is specifically that creating an objective for a superintelligent machine that turns out to be not-moronic (in the sense of allowing the machine to understand and care about everything we care about—something that hundreds of years of ethical philosophy has failed to do) is extremely hard. Furthermore, trying to build safeguards for a machine potentially orders of magnitude better at escaping safeguards than you are is also extremely hard. I don’t view this point as particularly subtle because simply trying for five minutes to confidently come up with a good objective demonstrates how hard it is. Ditto for safeguards (fun video by Computerphile, if you want to watch it); and especially ditto for any safeguards that aren’t along the lines of “actually let’s not let the machine be superintelligent.”
When in fact the argument might come down to fairly nuanced points about natural language understanding, comprehension, competence, corrigibility etc.
Let’s address these point-by-point:
Natural Language Understanding—Philosophers (and anyone in the field of language processing) have been talking about how language has no clear meaning for centuries
Comprehension—In terms of superintelligent AGI, the AI will be capable of modeling the world better than you can. This implies the ability to make predictions and interact with people in a way that functionally looks identical to comprehension
Competence—Well the AGI is superintelligent so it’s already very competent. Maybe we could talk about competence in terms of deliberately disabling different capabilities of the AGI (which probably wouldn’t hurt) but, even then, there’s always a chance the AI gets around the disability in another way. And that’s a massive risk.
If by this, you mean something more along the lines of “feasibility of building an AGI” though, that’s a little more uncertain. However, at the very least, we are approaching the level of compute needed to simulate a human brain and, once reached, the next step of superintelligence won’t be far away. It’s not guaranteed but there’s a significant likelihood that AGI will be feasible in the future. Even this significant likelihood is really bad.
Corrigibility—Something a bunch of AI-Safety folk came up with as a framework for approaching problems. But it still hasn’t been solved
I’ll grant that some of these things are subtle. The average Joe won’t be aware of the complexity of language or AI progress benchmarks and I certainly wouldn’t fault them for being surprised by these things—I was surprised the first time I found out about this whole AI Safety thing too. At the same time though, most college-educated computer scientists should (and from my experience, do) have a good understanding of these things.
To be more explicit with respect to your steel-man in the OP:
That it might be more difficult than expected to build something generally intelligent that didn’t get at least some safeguards for free. Because unintended intelligent behaviour may have to be generated from the same second principles which generate intended intelligent behaviour.
The unintended behaviors we’re talking about are generally not the consequence of second-principles that the AI has learned; they’re the consequences of the fact that capturing all the things we care about in a first-principles hardcoded objective function is extremely difficult. Even if the hardcoded objective function is ‘satisfy requests by humans in ways that don’t make them unhappy,’ you still gotta define ‘requests’, ‘humans’ (in the biological sense), ‘make’ (how do you assign responsibility to actions in long causal chains?), ‘them’ (just the requestor? all of humanity alive? all of future humanity? all of humanity ever?), and unhappy (amount of dopamine? vocalized expressions of satisfactions? dopamine+vocalized expressions of satisfaction?). Most of those specifications lead to unexpectedly bad outcomes.
The thought experiment expects most of the behaviour to be as intended (if it were not, this would be a capabilities discussion rather than a control discussion). Supposing the second principles also generate some seemingly inconsistent unintended behaviours sounds like an idea that should get some sort of complexity penalty.
If we set-up a complexity penalty where we expected unintended behaviors in general, we likely would never get AGI in the first place. Neural networks are extremely complex and often do strange and inconsistent things on the margin. We’ve already seen inconsistent and unintended behaviors from things we’ve already built. Thank goodness none of this stuff is superintelligent!
I see that there’s a comment-chain under this reply but I’ll reply here to start a somewhat new line of thought. Let it be noted though that I’m pretty confident that I agree with the points that Turntrout makes. With that out of the way...
In case it isn’t clear, when Russel says ” It is trivial to construct a toy MDP...”, I interpret this to mean “It is trivial to conceive of a toy MDP...” That is, he is using the word in the sense of a constructive proof; he isn’t literally implying that building an AI-risky MDP is a trivial task.
I wouldn’t call the the opposition stupid either but I would suggest that they have not used their full imaginative capabilities to evaluate the situation. From the OP:
The mistake Yann LeCun is making here is specifically that creating an objective for a superintelligent machine that turns out to be not-moronic (in the sense of allowing the machine to understand and care about everything we care about—something that hundreds of years of ethical philosophy has failed to do) is extremely hard. Furthermore, trying to build safeguards for a machine potentially orders of magnitude better at escaping safeguards than you are is also extremely hard. I don’t view this point as particularly subtle because simply trying for five minutes to confidently come up with a good objective demonstrates how hard it is. Ditto for safeguards (fun video by Computerphile, if you want to watch it); and especially ditto for any safeguards that aren’t along the lines of “actually let’s not let the machine be superintelligent.”
Let’s address these point-by-point:
Natural Language Understanding—Philosophers (and anyone in the field of language processing) have been talking about how language has no clear meaning for centuries
Comprehension—In terms of superintelligent AGI, the AI will be capable of modeling the world better than you can. This implies the ability to make predictions and interact with people in a way that functionally looks identical to comprehension
Competence—Well the AGI is superintelligent so it’s already very competent. Maybe we could talk about competence in terms of deliberately disabling different capabilities of the AGI (which probably wouldn’t hurt) but, even then, there’s always a chance the AI gets around the disability in another way. And that’s a massive risk.
If by this, you mean something more along the lines of “feasibility of building an AGI” though, that’s a little more uncertain. However, at the very least, we are approaching the level of compute needed to simulate a human brain and, once reached, the next step of superintelligence won’t be far away. It’s not guaranteed but there’s a significant likelihood that AGI will be feasible in the future. Even this significant likelihood is really bad.
Corrigibility—Something a bunch of AI-Safety folk came up with as a framework for approaching problems. But it still hasn’t been solved
I’ll grant that some of these things are subtle. The average Joe won’t be aware of the complexity of language or AI progress benchmarks and I certainly wouldn’t fault them for being surprised by these things—I was surprised the first time I found out about this whole AI Safety thing too. At the same time though, most college-educated computer scientists should (and from my experience, do) have a good understanding of these things.
To be more explicit with respect to your steel-man in the OP:
The unintended behaviors we’re talking about are generally not the consequence of second-principles that the AI has learned; they’re the consequences of the fact that capturing all the things we care about in a first-principles hardcoded objective function is extremely difficult. Even if the hardcoded objective function is ‘satisfy requests by humans in ways that don’t make them unhappy,’ you still gotta define ‘requests’, ‘humans’ (in the biological sense), ‘make’ (how do you assign responsibility to actions in long causal chains?), ‘them’ (just the requestor? all of humanity alive? all of future humanity? all of humanity ever?), and unhappy (amount of dopamine? vocalized expressions of satisfactions? dopamine+vocalized expressions of satisfaction?). Most of those specifications lead to unexpectedly bad outcomes.
If we set-up a complexity penalty where we expected unintended behaviors in general, we likely would never get AGI in the first place. Neural networks are extremely complex and often do strange and inconsistent things on the margin. We’ve already seen inconsistent and unintended behaviors from things we’ve already built. Thank goodness none of this stuff is superintelligent!