The Curse of Reversal seems to match the lack of bidirectionality of ROME edits mentioned here: https://www.alignmentforum.org/posts/QL7J9wmS6W2fWpofd/but-is-it-really-in-rome-an-investigation-of-the-rome-model
voyantvoid
RE: claim 25 about the need for research organisations , my first thought is that government national security organisations might be suitable venues for this kind of research as they have several apparent advantages:
Large budgets
Existing culture and infrastructure for research in secret with internal compartmentalisation
Comparatively good track record for keeping results secret in crypto, such as the NSA with RSA or GCHQ with PGP
Routes to internal prestige and advancement without external publication
Preventing the creation of unaligned AI would accord with their national security goals
However, they may introduce problems of their own:
Clearance requirements limit the talent pool that can work with them
As government organisations with less of a start-up culture, they may be less accommodating of this kind of research
An information leak that one organisation is researching this area could lead to international arms races
Tools suitable for public release that are developed may be seen as untrustworthy by association, such as the skepticism towards the NSA’s crypto advice
A research group would be more beholden to higher-ups who would likely be less sympathetic to the necessity of alignment work compared to capability work
Has this option been discussed already?
For the Skunk Works and SpaceX examples, I did find myself wondering whether some aspects like the powerful decisive managers are strictly better or merely increase variance and so appear more often when looking at the most successful projects. I haven’t done much reading of the primary and secondary sources for progress studies, how easy would it be to find details of the practices of average or failed projects to compare against?