Bogdan Ionut Cirstea comments on We have promising alignment plans with low taxes

Bogdan Ionut Cirstea 13 Nov 2023 19:12 UTC
4 points
−4
One somewhat obvious thing to do with ~human-level systems with “initial loose alignment” is to use them to automate alignment research (e.g. the superalignment plan). I think this kind of two-step plan is currently the best we have, and probably by quite some margin. Many more details for why I believe this in these slides and in this AI safety camp ’24 proposal.
- Seth Herd 16 Nov 2023 22:11 UTC
  3 points
  1
  Parent
  I made time to go through your slides. You appear to be talking about LLMs, not language model agents. That’s what I’m addressing.
  
  If you can align those, using them to align a different type of AGI would be a bit beside the point in most scenarios (maybe they’d progress so slowly that another type would overtake them before they pulled off a pivotal act using LMA AGI).
  
  I don’t see a barrier to LMAs achieving full, agentic AGI. And I think they’ll be so useful and interesting that they’ll inevitably be made pretty efficiently.
  
  I don’t quite understand why others don’t agree that this will happen. Perhaps I’ll write a post question asking why this is.
  - Bogdan Ionut Cirstea 17 Nov 2023 0:03 UTC
    1 point
    0
    Parent
    Agree, I do mostly discuss LLMs, but I think there’s significant overlap in aligning LLMs and LMAs.
    Also agree that LMAs could scale all the way, but I also think once you get ~human-level automated alignment research, its likely applicability to other types of systems too (than LMAs and LLMs) should still be a nice bonus.