Mitchell_Porter comments on AGI Ruin: A List of Lethalities

Mitchell_Porter 6 Jun 2022 5:21 UTC
36 points
12
What concerns me the most is the lack of any coherent effort anywhere, towards solving the biggest problem: identifying a goal (value system, utility function, decision theory, decision architecture...) suitable for an autonomous superhuman AI.
In these discussions, Coherent Extrapolated Volition (CEV) is the usual concrete formulation of what such a goal might be. But I’ve now learned that MIRI’s central strategy is not to finish figuring out the theory and practice of CEV—that’s considered too hard (see item 24 in this post). Instead, the hope is to use safe AGI to freeze all unsafe AGI development everywhere, for long enough that humanity can properly figure out what to do. Presumably this freeze (the “pivotal act”) would be carried out by whichever government or corporation or university crossed the AGI threshold first; ideally there might even become a consensus among many of the contenders that this is the right thing to do.
I think it’s very appropriate that some thought along these lines be carried out. If AGI is a threat to the human race, and it arrives before we know how to safely set it free, then we will need ways to try to neutralize that dangerous potential. But I also think it’s vital that we try to solve that biggest problem, e.g. by figuring out how to concretely implement CEV. And if one is concerned that this is just too much for human intellect to figure out, remember that AI capabilities are rising. If humans can’t figure out CEV unaided, maybe they can do it with the help of AI. To me, that’s the critical pathway that we should be analyzing.
P.S. I have many more thoughts on what this might involve, but I don’t know when I will be able to sort through them all. So for now I will just list a few people whose work is on my shortlist of definitely or potentially relevant (certainly not a complete list): June Ku, Vanessa Kosoy, Jessica Taylor, Steven Byrnes, Stuart Armstrong.
- Quintin Pope 6 Jun 2022 6:27 UTC
  21 points
  8
  Parent
  There’s shard theory, which aims to describe the process by which values form in humans. The eventual aim is to understand value formation well enough that we can do it in an AI system. I also think figuring out human values, value reflection and moral philosophy might actually be a lot easier than we assume. E.g., the continuous perspective on agency / values is pretty compelling to me and changes things a lot, IMO.
- Trevor Cappallo 20 Jun 2022 15:34 UTC
  −28 points
  −19
  Parent
  Here’s an outside-the-box suggestion:
  Clearly the development of any AGI is an enormous risk. While I can’t back this up with any concrete argument, a couple decades of working with math and CS problems gives me a gut intuition that statements like “I figure there’s a 50-50 chance it’ll kill us”, or even a “5-15% everything works out” are wildly off. I suspect this is the sort of issue where the probability of survival is funneled to something more like either $> 0.9999$ or $< 0.0001$ , of which the latter currently seems far more likely.
  Has anyone discussed the concept of deliberately trying to precipitate a global nuclear war? I’m half kidding, but half not; if the risk is really so great and so imminent and potentially final as many on here suspect, then a near-extinction-event like that (presumably wiping out the infrastructure for GPU farms for a long time to come) which wouldn’t actually wipe out the race but could buy time to work the problem (or at least pass the buck to our descendants) could conceivably be preferable.
  Obviously, it’s too abhorrent to be a real solution, but it does have the distinct advantage that it’s something that could be done today if the right people wanted to do it, which is especially important given that I’m not at all convinced that we’ll recognize a powerful AGI when we see it, based on how cavalierly everyone is dismissing large language models as nothing more than a sophisticated parlor trick, for instance.
  - trevor 13 Jan 2024 20:25 UTC
    −17 points
    0
    Parent
    Just want to clarify: this isn’t me, I didn’t write this. My last name isn’t Cappallo. I didn’t find out about this comment until today, when I did a Ctrl + f to find a comment I wrote around the time this was posted.
    I’m the victim here, and in fact I have written substantially about the weaponization of random internet randos to manipulate people’s perceptions.
    - Trevor Cappallo 14 Jan 2024 23:19 UTC
      10 points
      6
      Parent
      I confess I am perplexed, as I suspect most people are aware there is more than one Trevor in the world. As you point out, that is not your last name. I have no idea who you are, or why you feel this is some targeted “weaponization.”
    - Zack_M_Davis 13 Jan 2024 21:27 UTC
      9 points
      4
      Parent
      What weaponization? It would seem very odd to describe yourself as being the “victim” of someone else having the same first name as you.