Daniel Kokotajlo answers Do alignment concerns extend to powerful non-AI agents?

Daniel Kokotajlo 24 Jun 2022 20:43 UTC
4 points
1. Superhuman agents these days are all built up out of humans talking to each other. That helps a lot for their alignability, in multiple ways. For an attempt to transfer this secret sauce to an AI agent, see Iterated Distillation and Amplification, which as I understand it works by basically making a really good human-imitator, then making a giant bureacracy of them, and then imitating that bureacracy & repeating the process.
2. The AIs we will soon build will be superhuman in new ways, ways that no current superhuman agent enjoys. (See e.g. Bostrom’s breakdown of speed, quality, and collective intelligence—current organizations are superhuman in “collective” but human-level in speed and quality)
3. To answer your question, no, I’d feel pretty good about Paul or Eliezer or me being uploaded. If it was a random human being instead of one of those two, I’d still think things would probably be OK though there’d be a still-too-large chance of catastrophe.
- the gears to ascension 24 Jun 2022 23:17 UTC
  2 points
  Parent
  humans talking to each other already has severe misalignment. ownership exploitation is the primary threat folks seem to fear from ASI: “you’re made of atoms the ai can use for something else” ⇒ “you’re made of atoms jeff bezos and other big capital can use for something else”. I don’t think point 1 holds strongly. youtube is already misaligned; it’s not starkly superhuman, but it’s much better at selecting superstimuli than most of its users. hard asi would amplify all of these problems immensely, but because they aren’t new problems, I do think seeking formalizations of inter-agent safety is a fruitful endeavor.
  - Daniel Kokotajlo 25 Jun 2022 2:39 UTC
    3 points
    Parent
    Oh I agree with all that. I said “it helps a lot for their alignability” not “they are all aligned.”
    - the gears to ascension 25 Jun 2022 8:23 UTC
      1 point
      Parent
      makes sense, glad we had this talk :thumbsup: