jacquesthibs comments on The Field of AI Alignment: A Postmortem, and What To Do About It

jacquesthibs 26 Dec 2024 19:24 UTC
10 points
0
Putting venues aside, I’d like to build software (like AI-aided) to make it easier for the physics post-docs to onboard to the field and focus on the ‘core problems’ in ways that prevent recoil as much as possible. One worry I have with ‘automated alignment’-type things is that it similarly succumbs to the streetlight effect due to models and researchers having biases towards the types of problems you mention. By default, the models will also likely just be much better at prosaic-style safety than they will be at the ‘core problems’. I would like to instead design software that makes it easier to direct their cognitive labour towards the core problems.
I have many thoughts/ideas about this, but I was wondering if anything comes to mind for you beyond ‘dedicated venues’ and maybe writing about it.