Thane Ruthenis comments on Accurate Models of AI Risk Are Hyperexistential Exfohazards

Thane Ruthenis 26 Dec 2022 7:57 UTC
2 points
3
Who is it that actually prefers that future?
Tons of people? Xenophobes, homophobes, fascists, religious fanatics, elitists of various flavors. Some governments are run by such people; some countries have a majority of such people.
I’m not saying torture gives you a competitive edge (where did I say that?), I’m saying a lot of people genuinely prefer terrible fates for their outgroups. And while, sure, getting exposed to said outgroups may change their minds, it’s not their current values, and the AI wouldn’t care about their nice counterfactual selves who’d learned the value of friendship. The AI would just enforce their current reflectively endorsed preferences.
- Charlie Steiner 28 Dec 2022 19:45 UTC
  3 points
  0
  Parent
  Even religious fanatics I’d call incoherent even more than they are malicious. Sure, the Taliban want unbelievers to be punished, but they also want God to be real and for the unbelievers to convert to the true faith.
  When you talk about their “current values” without any process of growth, I don’t think there’s any there there—it’s a big mess, not a utility function. Talking about good processes of growth is a vital part of getting an AI to do something that looks like “what you want.”
  Okay, maybe you could get to dystopia without just killing everyone by building an AI that tries to do some very specific thing (“maintain US military supremacy”), but only in the way that people typically imagine that very specific thing (can’t just kill all humans and maintain empty U.S. military bases). But mostly I’d expect we’d just die.
  - Thane Ruthenis 30 Dec 2022 20:01 UTC
    1 point
    0
    Parent
    When you talk about their “current values” without any process of growth, I don’t think there’s any there there—it’s a big mess, not a utility function
    Sure, yes, exactly my point. The problem is, you don’t need to untangle this mess, or care about having coherent values, to tell an AGI to do things. It’s not going to loop back to you and complain that what you’re telling it to do is incoherent, inasmuch as you’ve solved the control problem and successfully made it do what you want. It’ll just do what you want, the way you’re imagining it, however incoherent it is.
    “Maintain US military supremacy the way I typically imagine it” is, in fact, the primary use-case I have in mind, not a weird, unlikely exception.
    Talking about good processes of growth is a vital part of getting an AI to do something that looks like “what you want.”
    How so? I have wants now. Why do I have to do some kind of “growth”, for these wants to become legitimate? What’d prevent an AGI from understanding them as they are now?