Bogdan Ionut Cirstea comments on The Checklist: What Succeeding at AI Safety Will Involve

Bogdan Ionut Cirstea 10 Sep 2024 22:04 UTC
8 points
−3
Making the AI Systems Do Our Homework (?????)
I’ve now seen this meme overused to such a degree that I find it hard to take seriously anything written after. To me it just comes across as unserious if somebody apparently cannot imagine how this might happen, even after obvious (to me, at least) early demos/prototypes have been published, e.g. https://sakana.ai/ai-scientist/, Discovering Preference Optimization Algorithms with and for Large Language Models, A Multimodal Automated Interpretability Agent.
On a positive note, though, at least they didn’t also bring up the ‘Godzilla strategies’ meme.
- Adam Scholl 11 Sep 2024 17:03 UTC
  6 points
  0
  Parent
  For what it’s worth, as someone in basically the position you describe—I struggle to imagine automated alignment working, mostly because of Godzilla-ish concerns—demos like these do not strike me as cruxy. I’m not sure what the cruxes are, exactly, but I’m guessing they’re more about things like e.g. relative enthusiasm about prosaic alignment, relative likelihood of sharp left turn-type problems, etc., than about whether early automated demos are likely to work on early systems.
  Maybe you want to call these concerns unserious too, but regardless I do think it’s worth bearing in mind that early results like these might seem like stronger/more relevant evidence to people whose prior is that scaled-up versions of them would be meaningfully helpful for aligning a superintelligence.