gwern comments on Dual use of artificial-intelligence-powered drug discovery

gwern 15 Mar 2022 3:40 UTC
LW: 29 AF: 9
AF
Worth remembering that flips of the reward function do happen: https://openai.com/blog/fine-tuning-gpt-2/#bugscanoptimizeforbadbehavior (“Was this a loss to minimize or a reward to maximize...”)
- rank-biserial 15 Mar 2022 7:17 UTC
  28 points
  Parent
  Galaxy-brained reason not to work on AI alignment: anti-aligned ASI is orders of magnitude more bad than aligned ASI is good, so it’s better to ensure that the values of the Singularity are more or less orthogonal to CEV (which happens by default).
  - James Payor 15 Mar 2022 23:50 UTC
    20 points
    Parent
    I see your point as warning against approaches that are like “get the AI entangled with stuff about humans and hope that helps”.
    
    There are other approaches with a goal more like “make it possible for the humans to steer the thing and have scalable oversight over what’s happening”.
    
    So my alternative take is: a solution to AI alignment should include the ability for the developers to notice if the utility function is borked by a minus sign!
    
    And if you wouldn’t notice something as wrong as a minus sign, you’re probably in trouble about noticing other misalignment.
  - Steven Byrnes 15 Mar 2022 11:49 UTC
    14 points
    Parent
    I had a long back-and-forth about that topic here. Among other things, I disagree that “more or less orthogonal to CEV” is the default in the absence of alignment research, because people will presumably be trying to align their AIs, and I think there are will be obvious techniques which will work well enough to get out of the “random goal” regime, but not well enough for reliability.
    - rank-biserial 17 Mar 2022 5:02 UTC
      4 points
      Parent
      
      I disagree that “more or less orthogonal to CEV” is the default in the absence of alignment research,
      
      because people will presumably be trying to align their AIs
      
      people trying to align their AIs == alignment research