TsviBT comments on Talent Needs of Technical AI Safety Teams

TsviBT 25 May 2024 1:13 UTC
6 points
0
Yeah that looks good, except that it takes an order of magnitude longer to get going on conceptual alignment directions. I’ll message Adam to hear what happened with that.
- Joe Collman 25 May 2024 6:28 UTC
  6 points
  2
  Parent
  For reference there’s this: What I learned running Refine
  When I talked to Adam about this (over 12 months ago), he didn’t think there was much to say beyond what’s in that post. Perhaps he’s updated since.
  My sense is that I view it as more of a success than Adam does. In particular, I think it’s a bit harsh to solely apply the [genuinely new directions discovered] metric. Even when doing everything right, I expect the hit rate to be very low there, with [variation on current framing/approach] being the most common type of success.
  Agreed that Refine’s timescale is clearly too short.
  However, a much longer program would set a high bar for whoever’s running it.
  Personally, I’d only be comfortable doing so if the setup were flexible enough that it didn’t seem likely to limit the potential of participants (by being less productive-in-the-sense-desired than counterfactual environments).
  What links here?
  - TsviBT's comment on Talent Needs of Technical AI Safety Teams by yams (25 May 2024 20:57 UTC; 4 points)
  - TsviBT 25 May 2024 21:02 UTC
    2 points
    0
    Parent
    Ah thanks!
    
    In particular, I think it’s a bit harsh to solely apply the [genuinely new directions discovered] metric. Even when doing everything right, I expect the hit rate to be very low there, with [variation on current framing/approach] being the most common type of success.
    
    Mhm. In fact I’d want to apply a bar that’s even lower, or at least different: [the extent to which the participants (as judged by more established alignment thinkers) seem to be well on the way to developing new promising directions—e.g. being relentlessly resourceful including at the meta-level; having both appropriate Babble and appropriate Prune; not shying away from the hard parts].
    
    the setup were flexible enough that it didn’t seem likely to limit the potential of participants (by being less productive-in-the-sense-desired than counterfactual environments).
    
    Agree that this is an issue, but I think it can be addressed—certainly at least well enough that there’d be worthwhile value-of-info in running such a thing.
    
    I’d be happy to contribute a bit of effort, if someone else is taking the lead. I think most of my efforts will be directed elsewhere, but for example I’d be happy to think through what such a program should look like; help write justificatory parts of grant applications; and maybe mentor / similar.
- Garrett Baker 25 May 2024 1:34 UTC
  3 points
  0
  Parent
  Report back if you get details, I’m curious.
  - TsviBT 25 May 2024 20:57 UTC
    4 points
    0
    Parent
    See Joe’s sibling comment
    
    https://www.lesswrong.com/posts/QzQQvGJYDeaDE4Cfg/talent-needs-in-technical-ai-safety#JP5LA9cNgqxgdAz8Z
    - Garrett Baker 25 May 2024 21:15 UTC
      2 points
      0
      Parent
      I have, and I also remember seeing Adam’s original retrospective, but I always found it unsatisfying. Thanks anyway!