I want to flesh out one particular rushed unreasonable developer scenario that I’ve been thinking about lately: there’s ten people inside the AI company who are really concerned about catastrophic risk from misalignment. The AI company as a whole pays lip service to AI risk broadly construed and talks occasionally about risk from AGI, but they don’t take misalignment risk in particular (perhaps especially risk from schemers) very seriously.
[...]
What should these people try to do? The possibilities are basically the same as the possibilities for what a responsible developer might do:
Build concrete evidence of risk, to increase political will towards reducing misalignment risk
Implement safety measures that reduce the probability of the AI escaping or causing other big problems
Do alignment research with the goal of producing a model that you’re as comfortable as possible deferring to.
If the second and third bullet point turn out to be too ambitious for the dire situation that you envision, another thing to invest some effort in is fail-safe measures where you don’t expect that your intervention will make things go well, but you can still try to avert failure modes that seem exceptionally bad.
If the second and third bullet point turn out to be too ambitious for the dire situation that you envision, another thing to invest some effort in is fail-safe measures where you don’t expect that your intervention will make things go well, but you can still try to avert failure modes that seem exceptionally bad.
I’m not sure if this is realistic in practice, but it makes sense to stay on the lookout for opportunities.