Good point. You’re right to highlight the importance of the offense-defense balance in determining the difficulty of high-impact tasks, rather than alignment difficulty alone. This is a crucial point that I’m planning on expand on in the next post in this sequence.
Many things determine the overall difficulty of HITs:
the “intrinsic” offense-defense balance in related fields (like biotechnology, weapons technologies and cybersecurity) and especially whether there are irresolutely offense-dominant technologies that transformative AI can develop and which can’t be countered
Overall alignment difficulty, affecting whether we should expect to see a large number of strategic, power seeking unaligned systems or just systems engaging in more mundane reward hacking and sycophancy.
Technology diffusion rates, especially for anything offense dominant, e.g. should we expect frontier models to leak or be deliberately open sourced
Geopolitical factors, e.g. are there adversary countries or large numbers of other well resourced rogue actors to worry about not just accidents and leaks and random individuals
The development strategy (e.g. whether the AI technologies are being proactively developed by a government or in public-private partnership or by companies who can’t or won’t use them protectively)
My rough suspicion is that all of these factors matter quite a bit, but since we’re looking at “the alignment problem” in this post I’m pretending that everything else is held fixed.
The intrinsic offense-defense balance of whatever is next on the ‘tech tree’, as you noted, is maybe the most important overall, as it affects the feasibility of defensive measures and could push towards more aggressive strategies in cases of strong offense advantage. It’s also extremely difficult to predict ahead of time.
Good point. You’re right to highlight the importance of the offense-defense balance in determining the difficulty of high-impact tasks, rather than alignment difficulty alone. This is a crucial point that I’m planning on expand on in the next post in this sequence.
Many things determine the overall difficulty of HITs:
the “intrinsic” offense-defense balance in related fields (like biotechnology, weapons technologies and cybersecurity) and especially whether there are irresolutely offense-dominant technologies that transformative AI can develop and which can’t be countered
Overall alignment difficulty, affecting whether we should expect to see a large number of strategic, power seeking unaligned systems or just systems engaging in more mundane reward hacking and sycophancy.
Technology diffusion rates, especially for anything offense dominant, e.g. should we expect frontier models to leak or be deliberately open sourced
Geopolitical factors, e.g. are there adversary countries or large numbers of other well resourced rogue actors to worry about not just accidents and leaks and random individuals
The development strategy (e.g. whether the AI technologies are being proactively developed by a government or in public-private partnership or by companies who can’t or won’t use them protectively)
My rough suspicion is that all of these factors matter quite a bit, but since we’re looking at “the alignment problem” in this post I’m pretending that everything else is held fixed.
The intrinsic offense-defense balance of whatever is next on the ‘tech tree’, as you noted, is maybe the most important overall, as it affects the feasibility of defensive measures and could push towards more aggressive strategies in cases of strong offense advantage. It’s also extremely difficult to predict ahead of time.