Benquo comments on “Rationalizing” and “Sitting Bolt Upright in Alarm.”

Benquo Jul 17, 2019, 4:46 AM
12 points
Another thing I’d add—putting this in its own comment to help avoid any one thread blowing up in complexity:
The orientation-towards-clarity problem is at the very least strongly analogous to, and most likely actually an important special case of, the AI alignment problem.
Friendliness is strictly easier with groups of humans, since the orthogonality thesis is false for humans—if you abuse us out of our natural values you end up with stupider humans and groups. This is reason for hope about FAI relative to UFAI, but also a pretty strong reason to prioritize developing a usable decision theory and epistemology for humans over using our crappy currently-available decision theory to direct resources in the short run towards groups trying to solve the problem in full generality.
AGI will, if ever, almost certainly be built—directly or indirectly—by a group of humans, and if that group is procedurally Unfriendly (as opposed to just foreign), there’s no reason to expect the process to correct to FAI). For this reason, friendly group intelligence is probably necessary for solving the general problem of FAI.
- Raemon Jul 17, 2019, 5:52 AM
  2 points
  Parent
  I’m not sure I agree with all the details of this (it’s not obvious to me that humans are friendly if you scale them up) but I agree that the orientation towards clarity likely has important analogues to the AI Alignment problem.
  - Benquo Aug 20, 2019, 9:46 PM
    4 points
    Parent
    it’s not obvious to me that humans are friendly if you scale them up
    It seems like any AI built by multiple humans coordinating is going to reflect the optimization target of the coordination process building it, so we had better figure out how to make this so.
    - Raemon Aug 22, 2019, 1:46 AM
      5 points
      Parent
      Initially I replied to this with “yeah, that seems straightforwardly true”, then something about that felt off and then it took me awhile to figure out why.
      This:
      It seems like any AI built by multiple humans coordinating is going to reflect the optimization target of the coordination process building it
      ...seems straightforwardly true.
      This:
      ..., so we had better figure out how to make this so. [where “this” is “humans are friendly if you scale them up”]
      Could unpack a few different ways. I still agree with the general sentiment you’re pointing at here, but I think the most straightforward interpretation of this is mostly false.
      Humans are not scalably friendly, so many of the most promising forms of Friendly AI seem to _not_ be “humans who are scaled up”, instead they’re doing other things.
      One example being CEV. (Which hopes that “if you scale up ALL humans TOGETHER and make them think carefully as you do so, you get something good, and if it turns out that you don’t get something good that coheres it gracefully fails and says ‘nope, sorry, this didn’t work.’”. But this is a different thing that scaling any particular human or small group of humans)
      Iterated Amplication seems to more directly depend on humans being friendly as you scale them up, or at least some humans being so.
      I am in fact pretty wary of Iterated Amplication for that reason.
      The whole point of CEV, as I understand, is to figure out the thing you could build that is actually robust to you not being friendly yourself. The sort of thing that if the ancient greeks were building, you could possibly hope for them to figure out so that they didn’t accidentally lock the entire lightcone in Bronze Age Warrior Ethos.
      ...
      ..
      “You can’t built friendly AI without this”
      You and Zack have said this (or something like it) on occasion, and fwiw I get a fairly political red flag from the statement. Which is not to say I don’t think the statement is getting at something important. But I notice each group I talk to has a strong sense of “the thing my group is focused on is the key, and if we can’t get people to understand that we’re doomed.”
      I myself have periodically noticed myself saying (and thinking), “if we can’t get people to understand each other’s frames and ontologies, we autolose. If we can’t get people to jointly learn how to communicate and listen non-defensively and non-defensive-causing (i.e. the paradigm I’m currently pushing), we’re doomed.”
      But, when I ask myself “is that really true? Is it sheer autolose if we don’t all learn to doublecrux and whatnot?” No. Clearly not. I do think losing becomes more likely. I wouldn’t be pushing my preferred paradigm if I didn’t think that paradigm was useful. But the instinct to say “this is so important that we’re obviously doomed if everyone doesn’t understand and incorporate this” feels to me like something that should have a strong prior of “your reason for saying that is to grab attention and build political momentum.”
      (and to be clear, this is just my current prior, not a decisive argument. And again, I can certainly imagine human friendliness being crucial to at least many forms of AGI, and being quite useful regardless. Just noting that I feel a need to treat claims of this form with some caution.)
      - Raemon Aug 22, 2019, 9:52 PM
        2 points
        Parent
        Hmm – I do notice, one comment of yours up, you note: “if that group is procedurally Unfriendly (as opposed to just foreign), there’s no reason to expect the process to correct to FAI).” Something about this phrasing suggests you might be using the phrase friendly/unfriendly/foreign in ways that weren’t quite mapping to how I was using them.
        Noting this mostly as “I’m updating a bit towards my previous comment not quite landing in your ontology” (which I’m trying to get better at tracking).