Yes, obviously we should be consequentialists in what we emphasize. Yes, people in political power are precisely those selected to think “but what if the AI just maximized my values?” sounds extra-great.
But this worse-than-death dystopia really comes out of nowhere. Who is it that actually prefers that future? Why does torturing people give you a competitive edge when there’s superintelligent AI running around, ready to give a huge competitive edge to whoever asks the right questions?
There are certainly sci-fi answers to these questions (the bad guy actively prefers suffering, torturing humans is an efficient way to get things done because we need uploads to do all our work for us), but they don’t seem likely in real life. If we succeed technically but fail spectacularly at coordination, the most likely outcome—aside from death for all because political mechanisms often fail at implementing technical solutions—seems better than the status quo, because most humans want good things to happen for humanity in the abstract.
Tons of people? Xenophobes, homophobes, fascists, religious fanatics, elitists of various flavors. Some governments are run by such people; some countries have a majority of such people.
I’m not saying torture gives you a competitive edge (where did I say that?), I’m saying a lot of people genuinely prefer terrible fates for their outgroups. And while, sure, getting exposed to said outgroups may change their minds, it’s not their current values, and the AI wouldn’t care about their nice counterfactual selves who’d learned the value of friendship. The AI would just enforce their current reflectively endorsed preferences.
Even religious fanatics I’d call incoherent even more than they are malicious. Sure, the Taliban want unbelievers to be punished, but they also want God to be real and for the unbelievers to convert to the true faith.
When you talk about their “current values” without any process of growth, I don’t think there’s any there there—it’s a big mess, not a utility function. Talking about good processes of growth is a vital part of getting an AI to do something that looks like “what you want.”
Okay, maybe you could get to dystopia without just killing everyone by building an AI that tries to do some very specific thing (“maintain US military supremacy”), but only in the way that people typically imagine that very specific thing (can’t just kill all humans and maintain empty U.S. military bases). But mostly I’d expect we’d just die.
When you talk about their “current values” without any process of growth, I don’t think there’s any there there—it’s a big mess, not a utility function
Sure, yes, exactly my point. The problem is, you don’t need to untangle this mess, or care about having coherent values, to tell an AGI to do things. It’s not going to loop back to you and complain that what you’re telling it to do is incoherent, inasmuch as you’ve solved the control problem and successfully made it do what you want. It’ll just do what you want, the way you’re imagining it, however incoherent it is.
“Maintain US military supremacy the way I typically imagine it” is, in fact, the primary use-case I have in mind, not a weird, unlikely exception.
Talking about good processes of growth is a vital part of getting an AI to do something that looks like “what you want.”
How so? I have wants now. Why do I have to do some kind of “growth”, for these wants to become legitimate? What’d prevent an AGI from understanding them as they are now?
Nah, this is way too dark and gritty.
Yes, obviously we should be consequentialists in what we emphasize. Yes, people in political power are precisely those selected to think “but what if the AI just maximized my values?” sounds extra-great.
But this worse-than-death dystopia really comes out of nowhere. Who is it that actually prefers that future? Why does torturing people give you a competitive edge when there’s superintelligent AI running around, ready to give a huge competitive edge to whoever asks the right questions?
There are certainly sci-fi answers to these questions (the bad guy actively prefers suffering, torturing humans is an efficient way to get things done because we need uploads to do all our work for us), but they don’t seem likely in real life. If we succeed technically but fail spectacularly at coordination, the most likely outcome—aside from death for all because political mechanisms often fail at implementing technical solutions—seems better than the status quo, because most humans want good things to happen for humanity in the abstract.
Tons of people? Xenophobes, homophobes, fascists, religious fanatics, elitists of various flavors. Some governments are run by such people; some countries have a majority of such people.
I’m not saying torture gives you a competitive edge (where did I say that?), I’m saying a lot of people genuinely prefer terrible fates for their outgroups. And while, sure, getting exposed to said outgroups may change their minds, it’s not their current values, and the AI wouldn’t care about their nice counterfactual selves who’d learned the value of friendship. The AI would just enforce their current reflectively endorsed preferences.
Even religious fanatics I’d call incoherent even more than they are malicious. Sure, the Taliban want unbelievers to be punished, but they also want God to be real and for the unbelievers to convert to the true faith.
When you talk about their “current values” without any process of growth, I don’t think there’s any there there—it’s a big mess, not a utility function. Talking about good processes of growth is a vital part of getting an AI to do something that looks like “what you want.”
Okay, maybe you could get to dystopia without just killing everyone by building an AI that tries to do some very specific thing (“maintain US military supremacy”), but only in the way that people typically imagine that very specific thing (can’t just kill all humans and maintain empty U.S. military bases). But mostly I’d expect we’d just die.
Sure, yes, exactly my point. The problem is, you don’t need to untangle this mess, or care about having coherent values, to tell an AGI to do things. It’s not going to loop back to you and complain that what you’re telling it to do is incoherent, inasmuch as you’ve solved the control problem and successfully made it do what you want. It’ll just do what you want, the way you’re imagining it, however incoherent it is.
“Maintain US military supremacy the way I typically imagine it” is, in fact, the primary use-case I have in mind, not a weird, unlikely exception.
How so? I have wants now. Why do I have to do some kind of “growth”, for these wants to become legitimate? What’d prevent an AGI from understanding them as they are now?