Rohin Shah comments on Convincing All Capability Researchers

Rohin Shah 10 Apr 2022 7:44 UTC
29 points
The capabilities people have thrown a challenge right back at you!
Convincing All Alignment Researchers
Give the world’s hundred most respected AI alignment researchers $1M each to spend 3 months explaining why AI will be misaligned, with an extra $100M if by the end they can propose an argument capability researchers can’t shoot down. They probably won’t make any progress, but from then on when others ask them whether they think alignment is a real unsolved problem, they will be way more likely to say no. That only costs you a hundred million dollars!
I don’t think I could complete this challenge, yet I also predict that I would not then say that alignment is not a real unsolved challenge. I mostly expect the same problem from the OP proposal, for pretty similar reasons.
- Not Relevant 10 Apr 2022 8:52 UTC
  11 points
  Parent
  For the record I think this would also be valuable! If as an alignment researcher your arguments don’t survive the scrutiny of skeptics, you should probably update away from them. I think maybe what you’re highlighting here is the operationalization of “shoot down”, which I wholeheartedly agree is the actual problem.
  Re: the quantities of funding, I know you’re being facetious, but just to point it out, the economic value of of “capabilities researchers being accidentally too optimistic about alignment” and “alignment researchers being too pessimistic about alignment” are asymmetric.
  - Rohin Shah 10 Apr 2022 9:15 UTC
    7 points
    Parent
    If as an alignment researcher your arguments don’t survive the scrutiny of skeptics, you should probably update away from them.
    If that’s your actual belief you should probably update away now. People have tried to do this for years, and in fact in most cases the skeptics were not convinced.
    Personally, I’m much more willing to say that they’re wrong and so I don’t update very much.
    Re: the quantities of funding, I know you’re being facetious, but just to point it out, the economic value of of “capabilities researchers being accidentally too optimistic about alignment” and “alignment researchers being too pessimistic about alignment” are asymmetric.
    Yeah, that was indeed just humor, and I agree with the point.
    - Not Relevant 10 Apr 2022 16:52 UTC
      3 points
      Parent
      I was referring to your original statements:
      ...if by the end they can propose an argument capability researchers can’t shoot down
      [...]
      I don’t think I could complete this challenge, yet I also predict that I would not then say that alignment is not a real unsolved challenge.
      I think you might be construing my statement
      if your arguments don’t survive the scrutiny of skeptics, you should probably update away from them.
      as that you should take AI risk less seriously if you can’t convince the skeptics, as opposed to if the skeptics can’t convince you.
      - Rohin Shah 11 Apr 2022 8:06 UTC
        2 points
        Parent
        Ah, I see, that makes more sense, sorry for the misunderstanding.
        (Fwiw I and others have in fact talked with skeptics about alignment.)
- Vaniver 10 Apr 2022 15:32 UTC
  6 points
  Parent
  Hmmm, I wonder if there’s a version like this that would actually be decent at ‘getting people to grapple with’ the ideas? Like, if you got Alignment Skeptic Alice to judge the ELK prize submissions with Paul Christiano and Mark Xu (presumably by paying her a bunch of money), would Alice come away from the experience thinking “actually ELK is pretty challenging” or would she come away from it thinking “Christiano and Xu are weirdly worried about edge cases that would never happen in reality”?
  - Rohin Shah 11 Apr 2022 8:04 UTC
    2 points
    Parent
    would Alice come away from the experience thinking “actually ELK is pretty challenging” or would she come away from it thinking “Christiano and Xu are weirdly worried about edge cases that would never happen in reality”?
    If Alice is a skeptic because she actually has thought about it for a while and come to an opinion, it will totally be the latter.
    If Alice has just not thought about it much, and she believes that AGI is coming, then she might believe the former. But in that case I think you could just have Alice talk to e.g. me for, say, 10 hours, and I’d have a pretty decent chance of convincing her that we should have more investment in AGI alignment.
    (Or to put it another way: just actually honestly debating with the other person seems a lot more likely to work than immersing them in difficult alignment research—at least as long as you know how to debate with non-rationalists.)
    - Not Relevant 11 Apr 2022 22:05 UTC
      1 point
      Parent
      Your time isn’t scalable, though—there are well over 10,000 replacement-level AI researchers.
      - Rohin Shah 12 Apr 2022 12:51 UTC
        2 points
        Parent
        Of course, but “judge the ELK prize submissions with Paul Christiano and Mark Xu” is also not a scalable approach, since it requires a lot of conversation with Paul and Mark?
        Not Relevant 12 Apr 2022 17:17 UTC
        1 point
        Parent
        I think we agree that scaling is relative and more is better!
- johnlawrenceaspden 11 Apr 2022 12:01 UTC
  4 points
  Parent
  God, absolutely, yes, do I get to talk to the sceptic in question regularly over the three months?
  Given three months of dialogue with someone who thinks like me about computers and maths, and where we both promise to take each other’s ideas seriously, if I haven’t changed his mind far enough to convince him that there are serious reasons to be scared, he will have changed mine.
  I have actually managed this with a couple of sceptic friends, although the three months of dialogue has been spread out over the last decade!
  And I don’t know what I’m talking about. Are you seriously saying that our best people can’t do this?! Eliezer used to make a sport of getting people to let him out of his box. And has always been really really good at explaining complicated thoughts persuasively.
  Maybe our arguments aren’t worth listening to. Maybe we’re just wrong.
  Give me this challenge!! Nobody needs to pay me, I will try to do this for fun and curiosity with anyone on the other side who is open-minded enough to commit to regular chats. An hour every evening?
  In person would be better, so Cambridge or maybe London? I can face the afternoon train for this.
  - Rohin Shah 12 Apr 2022 12:50 UTC
    8 points
    Parent
    I think it’s totally doable (and I have done it myself) to convince people who haven’t yet staked a claim as an Alignment Skeptic. There are specific people such as Yann Lecun who are publicly skeptical of alignment research; it is them who I imagine we could not convince.
    - johnlawrenceaspden 12 Apr 2022 19:25 UTC
      3 points
      Parent
      OK, I myself am sceptical of alignment research, but not at all sceptical of the necessity for it.
      Do you think someone like Eliezer has had a proper go at convincing him that there’s a problem? Or will he just not give us the time of day? Has he written anything coherent on the internet that I could read in order to see what his objections are?
      Personally I would love to lose my doom-related beliefs, so I’d like to try to understand his position as well as I can for two reasons.
      - Rohin Shah 13 Apr 2022 7:30 UTC
        8 points
        Parent
        Here’s an example
        johnlawrenceaspden 13 Apr 2022 13:30 UTC
        2 points
        Parent
        Great, thanks, so I’m going to write down my response to his thoughts as I hear them:
        Before reading the debate I read the Scientific American article it’s about. On first read, that seems convincing, ok, relax! And then take a closer look.
        What’s he saying (paraphrasing stuff from Scientific American):
        Superintelligence is possible, can be made to act in the world, and is likely coming soon.
        Intelligence and goals are decoupled
        Why would a sentient AI want to take over the world? It wouldn’t.
        Intelligence per se does not generate the drive for domination
        Not all animals care about dominance.
        I’m a bit worried by mention of the first law of robotics. I thought the point of all those stories was all the ways such laws might lead to weird outcomes.
        Blah blah joblessness, military robots, blah inequality, all true, I might even care if I thought there was going to be anyone around to worry about it. But it does mean that he’s not a starry-eyed optimist who thinks nothing can go wrong.
        That’s great! I agree with all of that, it’s often very hard to get people that far. I think he’s on board with most of our argument.
        And then right at the end (direct quote):
        Even in the worst case, the robots will remain under our command, and we will have only ourselves to blame.
        OK, so he thinks that because you made a robot, it will stay loyal to you and follow your commands. No justification given.
        It’s not really fair to close read an article in a popular magazine, but at this point I think he maybe he realises that you can make a superintelligent wish-granting machine, but hasn’t thought about what happens if you make the wrong wish and want to change it later.
        (I’m supposed to not be throwing mythological and literary references into things any more, but I can’t help but think about the Sybil, rotting in her bag because Apollo had granted her eternal life but not eternal youth, or TS Eliot’s: “That is not it at all, That is not what I meant, at all.” )
        So let’s go and look at the debate itself, rather than the article.
    - Logan Riggs 12 Apr 2022 23:25 UTC
      2 points
      Parent
      I was talking to someone recently who talked to Yann and got him to agree with very alignment-y things, but then a couple days later, Yann was saying very capabilities things instead.
      The “someone”’s theory was that Yann’s incentives and environment is all towards capabilities research.

Rohin Shah comments on Convincing All Capability Researchers

Convincing All Alignment Researchers