peterbarnett comments on Prometheus’s Shortform

peterbarnett 4 Jun 2024 21:37 UTC
11 points
6
Thanks for writing this, I think this is a common and pretty rough experience.
Have you considered doing cybersecurity work related to AI safety? i.e. work would help prevent bad actors stealing model weights and AIs themselves escaping. I think this kind of work would likely be more useful than most alignment work.
I’d recommend reading Holden Karnofsky’s takes, as well as the recent huge RAND report on securing model weights. Redwood’s control agenda might also be relevant.
I think this kind of work is probably extremely useful, and somewhat neglected, it especially seems to be missing people who know about cybersecurity and care about AGI/alignment.
- Garrett Baker 4 Jun 2024 21:52 UTC
  3 points
  −2
  Parent
  I note that I am uncertain whether working on such a task would increase or decrease global stability & great power conflicts.
  - Joe Collman 6 Jun 2024 22:41 UTC
    2 points
    0
    Parent
    Working on this seems good insofar as greater control implies more options. With good security, it’s still possible to opt in to whatever weight-sharing / transparency mechanisms seem net positive—including with adversaries. Without security there’s no option.
    Granted, the [more options are likely better] conclusion is clearer if we condition on wise strategy.
    However, [we have great security, therefore we’re sharing nothing with adversaries] is clearly not a valid inference in general.
    - Garrett Baker 6 Jun 2024 23:40 UTC
      2 points
      0
      Parent
      Not necessarily. If we have the option to hide information, then even if we reveal information, adversaries may still assume (likely correctly) we aren’t sharing all our information, and are closer to a decisive strategic advantage than we appear. Even in the case where we do share all our information (which we won’t).
      
      Of course the more options are likely better option holds if the lumbering, slow, disorganized, and collectively stupid organizations which have those options somehow perform the best strategy, but they’re not actually going to take the best strategy. Especially when it comes to US-China relations.
      
      ETA:
      
      [we have great security, therefore we’re sharing nothing with adversaries] is clearly not a valid inference in general.
      
      I don’t think the conclusion holds if that is true in general, and I don’t think I ever assumed or argued it was true in general.
      - Joe Collman 7 Jun 2024 2:17 UTC
        2 points
        0
        Parent
        then even if we reveal information, adversaries may still assume (likely correctly) we aren’t sharing all our information
        I think the same reasoning applies if they hack us: they’ll assume that the stuff they were able to hack was the part we left suspiciously vulnerable, and the really important information is behind more serious security.
        I expect they’ll assume we’re in control either way—once the stakes are really high.
        It seems preferable to actually be in control.
        I’ll grant that it’s far from clear that the best strategy would be used.
        (apologies if I misinterpreted your assumptions in my previous reply)