Raemon comments on Convincing All Capability Researchers

Raemon 8 Apr 2022 18:41 UTC
78 points
I do like the core concept here, but I think for it to work you need to have a pretty well specified problem that people can’t weasel out of. (I expect the default result of this to be “1000 researchers all come up with reasons they think they’ve solved alignment, without really understanding what was supposed to be hard in the first place.”)
You touch upon this in your post but I think it’s kinda the main blocker.
I do think might be a surmountable obstacle though.
- Steven Byrnes 8 Apr 2022 21:46 UTC
  6 points
  Parent
  I agree. I imagine a disagreement where the person says “you can’t prove my proposal won’t work” and the alignment judge says “you can’t prove your proposal will work”, and then there’s some nasty, very-hard-to-resolve debate about whether AGI is dangerous by default or not, involving demands for more detail, and then the person says “I can’t provide more detail because we don’t have AGI yet”, etc.
  
  I think of the disagreement between Paul and Eliezer about whether IDA was a promising path to safe AGI (back when Paul was more optimistic about it); both parties there were exceptionally smart and knowledgeable about alignment, and they still couldn’t reconcile.
  
  Giving people a more narrow problem (e.g. ELK) would help in some ways, but there could still be disagreement over whether solving that particular problem in advance (or at all) is in fact necessary to avert AGI catastrophe.
  - Raemon 8 Apr 2022 23:36 UTC
    5 points
    Parent
    I’ve seen proposals of the form “Eliezer and Paul both have to agree the researchers have solved alignment to get the grand prize”, which seems better than not-that, but, still seems insufficiently legibly fair to make it work with 1000 researchers.
    - Raemon 9 Apr 2022 0:50 UTC
      2 points
      Parent
      I have also seem some partially formalized stabs at something like “interpretability challenges”, somewhat inspired by Auditing Games, where there are multiple challenges (i.e. bronze, silver and gold awards), and the bronze challenge is meant to be something achievable by interpretability researchers within a couple years, and the gold challenge is meant to be something like “you can actually reliably detect deceptive adversaries, and other key properties a competent civilization would have before running dangerously powerful AGI.”
      This isn’t the same as an “alignment prize”, but might be easier to specify.
- Linda Linsefors 10 Apr 2022 13:31 UTC
  4 points
  Parent
  I don’t think well specified problem is the way to go.
  
  Instead chose a group of judges, and if the majority of them think something is progress, then that counts a winn. This are the rules, and no argument afterwards.
  
  But to make this fair, the judges should be available to discuss possible research directions, and what they would and would not concider progress, before and during the 3 months of reaserch. Or if that’s not possible, due to the scale, maybe some group of people who are good at predicting what the real judges would say.
  Besides, giving out the money, and not stay in touch during the 3 months, seems like a bad idea anyway.
  - Raemon 10 Apr 2022 17:55 UTC
    2 points
    Parent
    I agree that actual judges are better for actually solving the problem, the question is whether you can legibly scale the reputation of those judges to 1000 researchers who aren’t already bought into the judge’s paradigm.
    - Linda Linsefors 14 Apr 2022 13:11 UTC
      1 point
      Parent
      I don’t know. I would start small, and worry about scaling later.
      If this is worth doing at all it is also worth doing at smaller scale, so it would not be a waste if the project don’t scale.
      Also, I suspect that quality is more important than quantity here. Doing a really good job running this program with the 50 most influential AI reserchers, might be better payoff, than doing a half job with the top 1000 reaerchers.
      - Raemon 14 Apr 2022 14:55 UTC
        4 points
        Parent
        Oh, maybe, but it seemed like the specific proposal here was the scale. It seemed like hiring a few people to do AI research was… sort of just the same as all the hiring and independent research that’s already been happening. The novel suggestion here is hiring less aligned people at a scale that would demonstrate to the rest of the world the problem was really hard. (not sure exactly what the OP meant, and not sure what specific new element you were was interested in)
        philh 18 Apr 2022 23:03 UTC
        4 points
        Parent
        
        it seemed like the specific proposal here was the scale.
        
        I guess a question here is what the returns-to-scale curve looks like? I’d be surprised if the 501-1000th researchers were more valuable than the 1-500th, suggesting there is a smaller version that’s still worth doing.
        
        I don’t know where this guess comes from, but my guess might be that the curve is increasing up to somewhere between 10 and 100 researchers, and decreasing after that. But also there’s likely to be threshold effects at round/newsworthy numbers?
        Raemon 18 Apr 2022 23:08 UTC
        2 points
        Parent
        That does sound right-ish.
        Linda Linsefors 16 Apr 2022 16:52 UTC
        1 point
        Parent
        I’m not going to speculate about who ment what. But for me the new and interesing idea whas to pay people to do reserch in order to change their mind, as oposed to pay people to do reaserch in order to produce reserch.
        
        As far as I know all the current hireing and paying independent reserchers is directed towards people who already do belive that AI Safety reserch is dificult and important. Paying people who are not yet convinced is a new move (as far as I know) even at small scale.
        
        I guess that is is currently possible for an AI risk sceptics to get an AI Safety reserch grant. But none of the grant are designed for this purpouse, right? I think the format of a very high payed (more thatn they would earn otherwese) very short (not a major interuption to ongoing research) offer, with the possibility of a price at the end, is more optimised to get skeptics onboard.
        
        In short, the design of a funding program will be very diffrent when you have a difrent goal in mind.