romeostevensit comments on Announcement: AI alignment prize round 2 winners and next round

romeostevensit 16 Apr 2018 18:41 UTC
4 points
I’m not sure I see the point to awarding an already-in-the-works 67 page paper that happened to be released at the time of the competition if the goal of the prize is to stimulate AI work that otherwise would not have happened.
- paulfchristiano 18 Apr 2018 17:08 UTC
  22 points
  Parent
  Personally, my long-term goal is a world where high-quality work on alignment is consistently funded, and where people doing high-quality work on alignment have plenty of money. I think that an effort to restrict to counterfactually-additional alignment work would “save” some money (in the sense that I’d have the money rather than some researcher who is doing alignment work) but wouldn’t be great for that long-term goal.
  Also, if you actually think about the dynamics they are pretty crappy, even if you only avoid “obvious” cases. For example, it would become really hard for anyone to actually assess counterfactual impact, since every winner would need to make it look like there was at least a plausible counterfactual impact. (I already wish there was less implicit social pressure in that direction.)
  - romeostevensit 18 Apr 2018 23:04 UTC
    8 points
    Parent
    On reflection I strongly agree that social pressure around counterfactualness is a net harm for motivation.
- Scott Garrabrant 17 Apr 2018 1:41 UTC
  15 points
  Parent
  I think you want to reward output rather than output that would not have otherwise happened.
  This is similar to the fact that if you want to train calibration, you have to optimize you log score and just observe your lack of calibration as an opportunity to increase your log score.
- Ofer 16 Apr 2018 21:46 UTC
  13 points
  Parent
  If I understand correctly, one of the goals of this initiative is to increase the prestige that is associated with making useful contributions in AI safety. For that purpose, it doesn’t matter whether the prize incentivized the winning authors or not. But it is important that enough people will trust that the main criterion for selecting the winning works is usefulness.
- Raemon 16 Apr 2018 22:38 UTC
  5 points
  Parent
  My take on this is that the ideal version of this prize selects for both usefulness and counterfactualness, but selecting for counterfactualness without producing weird side effects seems hard. (I do think it’s worth spending an hour or two thinking about how to properly incentivize or reward counterfactualness, just, if you haven’t come up with anything, strictly rewarding quality/usefulness seems better)
  - romeostevensit 18 Apr 2018 1:26 UTC
    1 point
    Parent
    > selecting for counterfactualness without producing weird side effects seems hard
    agree, I just thought the winner in this case was over the top enough to not be in the fuzzy boundary but clearly on the other side.
    - cousin_it 18 Apr 2018 13:20 UTC
      6 points
      Parent
      Our rules don’t draw that boundary at the moment, and I’m not even sure how it could be phrased. Do you have any suggestions?
      - romeostevensit 18 Apr 2018 23:06 UTC
        3 points
        Parent
        I wouldn’t be in favor of adding explicit rules for goodheart related reasons. I think prizes and grants should have the minimum rules to account for basic logistics and the rest should be illegible.
    - Raemon 18 Apr 2018 4:47 UTC
      2 points
      Parent
      Ah, yeah that makes sense.