Alignment is also perceived as fragile. Almost all paradigms of alignment and AI safety research (interpretability, agent foundations, prosaic alignment, model encryption, etc.) Are often criticised on LW by different people as at best totally ineffectual from the opportunity cost perspective, and at worst downright harmful due to some unforeseen effects or as safety-washing enablers for AGI labs. (I myself am culpable of many such criticisms.)
OTOH, this very work on the strategy and methodology of AI safety development could be reasonably criticised as worsening the psychological state of AI safety researchers and therefore potentially net harmful despite its marginal improvements to strategy and methodology (if these even happen in practice, which is not clear to me).
Alignment is also perceived as fragile. Almost all paradigms of alignment and AI safety research (interpretability, agent foundations, prosaic alignment, model encryption, etc.) Are often criticised on LW by different people as at best totally ineffectual from the opportunity cost perspective, and at worst downright harmful due to some unforeseen effects or as safety-washing enablers for AGI labs. (I myself am culpable of many such criticisms.)
OTOH, this very work on the strategy and methodology of AI safety development could be reasonably criticised as worsening the psychological state of AI safety researchers and therefore potentially net harmful despite its marginal improvements to strategy and methodology (if these even happen in practice, which is not clear to me).